Sei sulla pagina 1di 8

Submit Your Assignment For Help

info@statisticshomeworkhelper.com
Go to Answer Directly

18.05 Problem Set 3, Spring 2014

Problem 1. (10 pts.) Independence. Three events A, B, and C are pairwise independent
if each pair is independent. They are mutually independent if they are pairwise independent
and in addition

P (A ∩ B ∩ C) = P (A)P (B)P (C). (1)

(a) Suppose we roll two 6-sided die. Consider the events:

A = ‘odd on die 1’ B = ‘odd on die 2’ C = ‘odd sum’

Are A, B, and C pairwise independent? Are they mutually independent?


(b) Consider the Venn diagram below. A, B and C are the overlapping circles and the
probabilities of each region are as marked. Does equation (1) hold. Are the events A, B, C
mutually independent?

A B

0.225 0.05 0.225

0.125
0.1 0.1

0.175

(c) For families with n children, the events ‘the family has children of both sexes’ and ‘there
is at most one girl’ are independent. What is n?

Problem 2. (10 pts.) R simulation. Suppose there is an experimental medical treatment


for a cancer that if untreated is nearly always fatal within 12-15 months. The doctors enroll
5000 patients in a study in which each patient is given the treatment and followed for 5
years. Let X be the length of time a random patient given the treatment survives. (If a
patient is still alive at the end of the study, then X = 5 for this patient.)
As the statistician it is your job to analyze the data. To put the data in a vector x you
need to do the following.
First download ps3prob2data.r and put the file in
your R working directory. Then give the following R commands.
> source('ps3prob2data.r')
> x = getprob2data()

(a) Compute the mean and standard deviation of the data.

https://www.statisticshomeworkhelper.com
1
18.05 Problem Set 3, Spring 2014

(b) Plot a frequency histogram of the data. Set the histogram so each bin has width 0.1
years. Print the histogram and turn it in with the pset.
(c) Using your answers in (a) and (b), write a short paragraph summarizing the data in a
useful way.
(d) Based on the (c), what are your conclusions about the effectiveness of the treatment?
What recommendations would you make for avenues of further research?

Problem 3. (10 pts.) Dice. Let X be the result of rolling a fair 4-sided die. Let Y be the
result of rolling a fair 6-sided die. Let Z be the average of X and Y .
(a) Find the standard deviation of X, of Y , and of Z.
(b) Carefully graph the pmf and cdf of Z.
(c) Game: You win 2X dollars if X > Y and lose 1 dollar otherwise. After playing this
game 60 times, what is your expected total gain (positive) or loss (negative)?

Problem 4. (10 pts.) Two scoops. Boxes of Raisin Bran cereal are 30cm tall. Due
to settling, boxes have a higher density of raisins at the bottom (h = 0) than at the top
(h = 30). Suppose the density (in raisins per cm of height) is given by f (h) = 40 − h.
(a) How many raisins are in a box?
(b) Let H be the height of a random raisin. Find and graph the pdf g(h) of H.
(c) Find and graph the cdf G(h) of H.
(d) What is the probability that a random raisin is in the bottom third of the box?

Problem 5. (10 pts.) The new normal. Recall that the normal distribution N (μ, σ 2 )
has pdf
1 (x−μ)2
f (x) = √ e− 2σ2 .
σ 2π
The standard normal distribution N (0, 1) has mean 0 (by symmetry), variance 1 (as we’ll
prove next week), and pdf φ(z) given by setting μ = 0 and σ = 1 above. The cdf is denoted
Φ(z) and does not have a nice formula. In this problem, we’ll show that scaling and shift-
ing a normal random variable gives a normal random variable. Suppose Z ∼ N (0, 1) and
X = aZ + b.

(a) Compute the mean μ and variance σ 2 of X.


(b) Express the cdf FX (x) of X in terms of Φ and then use the chain rule to find the pdf
fX (x) of X.
(c) Use (b) to show that X follows the N (b, a2 ) distribution.
(d) Use (a) and (c) to conclude that the N (μ, σ 2 ) distribution has mean μ and variance
σ2.

Problem 6. (10 pts.) Birth day. The length of human gestation is well-approximated by
a normal distribution with mean μ = 280 days and standard deviation σ = 8.5 days.
(a) Graph the corresponding pdf and cdf. You should do this using the dnorm, pnorm and
plot commands in R. Print the results and turn them in with the pset.

2
18.05 Problem Set 3, Spring 2014

Suppose your final exam is scheduled for May 18 and your pregnant professor has a due
date of May 25.
(b) Find the probability she will give birth on or before the day of the final.
(c) Find the probability she will give birth in May sometime after the exam.
(d) The professor decides to move up the exam date so there will be a 95% probability that
she will give birth afterward. What date should she pick?

3
18.05 Problem Set 3, Solutions
Spring 2014 Solutions

Problem 1. (10 pts.)


(a) We have P (A) = P (B) = P (C) = 1/2. Writing the outcome of die 1 first, we can easily
list all outcomes in the following intersections.

A ∩ B = {(1, 1), (1, 3), (1, 5), (3, 1), (3, 3), (3, 5), (5, 1), (5, 3), (5, 5)}
A ∩ C = {(1, 2), (1, 4), (1, 6), (3, 2), (3, 4), (3, 6), (5, 2), (5, 4), (5, 6)}
B ∩ C = {(2, 1), (4, 1), (6, 1), (2, 3), (4, 3), (6, 3), (2, 5), (4, 5), (6, 5)}

By counting we see
1
P (A ∩ B) = = P (A)P (B).
4
Likewise,
1 1
P (A ∩ C) = = P (A)P (C) and P (B ∩ C) = = P (B)P (C).
4 4
So, we see that A, B, and C are pairwise independent.
However, A ∩ B ∩ C = ∅, since if we roll an odd on die 1 and an odd on die 2, then the sum
of the two will be even. So, in this case,

P (A ∩ B ∩ C) = 0 = P (A)P (B)P (C),

and we conclude that A, B and C are not mutually independent.


(b) By totaling the regions we get P (A) = 0.225 + 0.05 + 0.1 + 0.125 = 0.5. Likewise
P (B) = 0.5 and P (C) = 0.5. Thus P (A)P (B)P (C) = 0.53 = 0.125 = P (A ∩ B ∩ C). So,
yes the product formula does hold.
Mutual independence requires pairwise independence as well as the multiplication formula
for all three events. We see that

P (A ∩ B) = 0.05 + 0.125 = 0.175, but P (A)P (B) = 0.52 = 0.25.

Since P (A)P (B) = P (A ∩ B) the two events are not independent. However, P (A)P (C) =
0.25 and P (A ∩ C) = 0.225, so A and C are not independent. Likewise P (B)P (C) = 0.25
and = P (B ∩ C) = 0.225, so B and C are not independent.
Since the three events are not all pairwise independent they are not mutually independent.
(c) Let A be the event “the family has children of both sexes” and B be the event “there is
at most one girl.” In order for A to ever be true we assume that n > 1. Now, if we let X
be the number of girls the we have

P (A) = P (1 ≤ X ≤ n − 1) P (B) = P (X ≤ 1) P (A ∩ B) = P (X = 1)

Since X ∼ binomial(n, 1/2) we have


2 n+1
P (A) = 1−P (X = 0)−P (X = n) = 1− , P (B) = P (X = 0)+P (X = 1) = .
2n 2n

1
18.05 Problem Set 3, Spring 2014 Solutions

Since we are told that A and B are independent, we have P (A)P (B) = P (A ∩ B), so

( n+1 ) ( )
2n 1( − 22n ) = 2nn
⇔ (n + 1) 1 − 22n =n
⇔ n + 1 − 2n+1 n−1 =n
⇔ 2n−1 =n+1
Plugging small values of n into the above equation, we find that n = 3.

Problem 2. (10 pts.) For (a) and (b) the R-code is posted in ps3-sol.r
(a) mean = 2.554528, standard deviation = 2.07
(b)

(c) Looking at the distribution we see it is bimodal with a spike at 5 years. About half the
patients die in the first year but about half live more than 2.5 years with over 20% still
alive after 5 years. The spike is because everyone who survives to 5 years is lumped into
that category. The average of 2.5 years is not that meaningful because there seem to be
two categories of patients. This is reflected in the large standard deviation.
(d) The treatment appears to be effective for about half the patients. More research would
be needed to understand what characteristics of the disease or patients predict the treatment
will be effective.

Problem 3. (10 pts.)


(a) We compute Var(X) = E(X 2 ) − E(X) etc. from the tables.
X 1 2 3 4 Y 1 2 3 4 5 6
p(x) 1/4 1/4 1/4 1/4 p(y) 1/6 1/6 1/6 1/6 1/6 1/6
X2 1 4 9 16 Y2 1 4 9 16 25 36
So, E(X) = 14 (1 + 2 + 3 + 4) = 25 , E(X 2 ) = 41 (1 + 4 + 9 + 16) = 15
2 . Thus, Var(X) =

5/4 ⇒ σX = 5/2 .
Similarly, E(Y ) = 72 , E(X 2 ) = 91
6 . So, Var(Y ) = 35
12 .
Since X and Y are independent,
X +Y 1 25
Var(Z) = Var = (Var(X) + Var(Y )) = .
2 4 24

2
18.05 Problem Set 3, Spring 2014 Solutions

Thus,

σX = 1.118 σY = 1.708 σZ = 1.021

(b) We graph the pmf of Z as point plot and then as a density histogram. The cdf is a
staircase graph.

1
0.16

0.8

P(Z <= z)
0.6
0.12
probability

0.4
0.08

0.2

0.04

1 2 3 4 5 0 2 4 6 8 10 12
z
z

(c) We see that the only pairs of (X, Y ) which satisfy X > Y are {(2, 1), (3, 1), (3, 2), (4, 1), (4, 2), (4, 3)} .
6
So P (X > Y ) = 24 . Moreover, we have

1 2 3
P (X > Y |X = 2) = P (X > Y |X = 3) = P (X > Y |X = 4) =
6 6 6

If W is our winnings for one game, we find


E(W ) = (−1)P (Y ≥ X) + 2(2P (X > Y |X = 2)P (X = 2) + 3P (X > Y |X = 3)P (X = 3) + 4P (X > Y |X = 3) P (X = 4))

18 40
+ =−
24 24

11

=
12
Now if played the game 60 times, and received winnings W1 , . . . , W60 , (with E(Wi ) = 11
12 ),
our expected total gain is

E(W1 + · · · + W60 ) = E(W1 ) + · · · + E(W60 ) = 55.

Problem 4. (10 pts.) (a) The number of raisins is


30 30
f (h)dh = (40 − h)dh = 750
0 0

(b) The probability density is just the actual density divided by the total number of raisins.
1
g(h) = 750 (40 − h).
h
40h h2
(c) For 0 ≤ h ≤ 30 we have G(h) = g(x) dx = − .
0 750 1500

3
18.05 Problem Set 3, Spring 2014 Solutions 4

PDF CDF: G(h) vs h

1.0
0.06

0.8
0.04

0.6
G
g

0.4
0.02

0.2
0.00

0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30

h h

(d) Since the height is 30 we need to find P (H ≤ 10).


Z 10 Z 10
1 7
P (H ≤ 10) = g(h)dh = (40 − h)dh = .
0 750 0 15

The R code for these plots is posted in ps3-sol.r

Problem 5. (10 pts.)


(a) We know that E(X) = aE(Z) + b = b and

Var(X) = Var(aZ + b) = a2 Var(Z) = a2 .

(b) Let x be any real number. We will first compute FX (x) = P (X ≤ x). Since X = aZ + b,
we see that
   
x−b x−b
FX (x) = P (X ≤ x) = P (aZ + b ≤ x) = P Z ≤ =Φ .
a a

So FX (x) = Φ x−b

a . Differentiating this with respect to x, we find

(x−b)2
   
1 0 x−b 1 x−b 1
fX (x) = Φ = φ =√ e− 2a2
a a a a 2π a

(c) From (b), we see that fX (x) is the pdf of N (b, a2 ) distribution
(d) From (b) and (c), we see that if Z is standard normal, then σZ + µ follows a N (µ, σ 2 )
distribution. From (a), we know that E(σZ + µ) = µ and Var(σZ + µ) = σ 2 .

Problem 6. (10 pts.)


(a) Suppose Y ∼ N (280, 8.5). The pdf, f (y) and cdf F (y) are plotted below.
18.05 Problem Set 3, Spring 2014 Solutions 5

PDF: f(x) vs. x CDF: F(x) vs x

1.0
0.8
0.04

0.6
F
f

0.02

0.4
0.2
0.00

0.0
250 260 270 280 290 300 250 260 270 280 290 300

x x

(b) There is some ambiguity here depending on the exact time of day of the due date. On
or before the day of the final means before midnight on the 18th. The due date is the 25th.
We’ll assume that means up to midnight, so the final is 7 days before the due date. We’ll
accept any number between 6 and 8
Let X be the number of days before or after May 25 that the baby is born. We want the
probability X ≤ −7 We know X ∼ N (0, 8.5).
If Z is a standard normal random variable, we have

P (X ≤ −7) = pnorm(-7,0,8.5) = 0.205


7
(Or we could have computed P (X ≤ −7) = P (Z ≤ − 8.5 ) pnorm(-7/8.5,0,1) = 0.205.
(c) We want the probability that the baby is born between May 19 (X = −6) and May 31
(X = 6). We compute
6 6
P (−6 ≤ X ≤ 6) = P (− ≤X≤ ) = 0.520
8.5 8.5

Again there is some ambiguity about the range. We’ll accept any reasonable choice.
x
(d) We want to find x such that P (X ≥ x) > 0.95. That is, we want P (Z ≥ 8.5 ) ≥ 0.95.
Using R: x = 8.5*qnorm(.05), we find x ≈ −14 (May 11).
We could also have done the calculation with a standard normal table.

Potrebbero piacerti anche