Probability & Statistics PDF

Study Material
for
P ro ba bility a nd S ta tistic s
AAOC ZC1 1 1
Distance Learning Programmes Division

Birla Institute of Technology & Science
Pilani – 333031 (Rajasthan)
July 2003
Course Developed by
M.S.Radhakrishnan
Word Processing & Typesetting by
Narendra Saini
Ashok Jitawat
Contents
Page No.
INTRODUCTION, SAMPLE SPACES & EVENTS 1
Probability 1
Events 2
AXIOMS OF PROBABILITY 4
Some elementary consequences of the Axioms 4
Finite Sample Space (in which all outcomes are equally likely) 6
CONDITIONAL PROBABILITY 11
Independent events 11
Theorem on Total Probability 14
BAYE’S THEOREM 16
MATHEMATICAL EXPECTATION & DECISION MAKING 22
RANDOM VARIABLES 26
Discrete Random Variables 27
Binomial Distribution 28
Cumulative Binomial Probabilities 29
Binomial Distribution – Sampling with replacement 31
Mode of a Binomial distribution 31
Hyper Geometric Distribution (Sampling without replacement) 32
Binomial distribution as an approximation to the Hypergeometric
34
Distribution
THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS 36
The mean of a Binomial Distribution 37
Digression 37
Chebychevs theorem 39
Law of large numbers 41
Poisson Distribution 42
Poisson approximation to binomial distribution 42
Cumulative Poisson distribution 43
Poisson Process 43
The Geometric Distribution 46
Multinomial Distribution 52
Simulation 54
CONTINUOUS RANDOM VARIABLES 56
Probability Density Function (pdf) 57
Normal Distribution 64
Normal Approximation to Binomial Distribution 69
Correction for Continuity 70
Other Probability Densities 71
The uniform Distribution 71
Gamma Function 73
Properties of Gamma Function 74
The Gamma Distribution 74
Exponential Distribution 74
Beta Distribution 78
The Log-Normal Distribution 79
JOINT DISTRIBUTIONS – TWO AND HIGHER DIMENSIONAL

83
RANDOM VARIABLES
Conditional Distribution 86
Independence 87
Two-Dimensional Continuous Random Variables 88
Marginal and Conditional Densities 90
Independence 91
The Cumulative Distribution Function 93
Properties of Expectation 100
Sample Mean 101
Sample Variance 102
SAMPLING DISTRIBUTION 115
Statistical Inference 115
Statistics 116
The Sampling Distribution of the Sample Mean X . 117
Inferences Concerning Means 128

Point Estimation 128
Estimation of n 130
Estimation of Sample proportion 143
Large Samples 143
Tests of Statistical Hypothesis 148
Notation 149
REGRESSION AND CORRELATION 164
Regression 164
Correlation 167
Sample Correlation Coefficient 167
INTRODUCTION, SAMPLE SPACES & EVENTS
Probability
Let E be a random experiment (where we ‘know’ all possible outcomes but can’t predict
what the particular outcome will be when the experiment is conducted). The set of all
possible outcomes is called a sample space for the random experiment E.
Example 1:
Let E be the random experiment:
Toss two coins and observe the sequence of heads and tails. A sample space for this
experiment could be S = {HH , TH , HT , TT }. If however we only observe the number
of heads got, the sample space would be S = {0, 1, 2}.
Example 2:
Toss two fair dice and observe the two numbers on the top. A sample space would be
(1,1), (1,2), (1,3),− − − − −−, (1,6)

(2,1), (2,2), (2,3),− − − − −
S= (3,1),
|
(6,1) − − − − − − − − − −, (6,6)
If however, we are interested only in the sum of the two numbers on the top, the
sample space could be S = { 2, 3, …, 12}.
Example 3:
Count the number of machines produced by a factory until a defective machine is

produced. A sample space for this experiment could be S = {1, 2, 3,− − − − − −}.
1
Example 4:
Count the life length of a bulb produced by a factory.

Here S will be {t | t ≥ 0} = [0, ∞).
Events
An event is a subset of the sample space.
Example 5:
Suppose a balanced die is rolled and we observe the number on the top. Let A be the
event: an even number occurs.
Thus in symbols,
A = {2,4,6} ⊂ S = {1,2,3,4,5,6}
Two events are said to be mutually exclusive if they cannot occur together; that is there
is no element common between them.
In the above example if B is the event: an odd number occurs, i.e. B = {1,3,5} , then A and
B are mutually exclusive.
Solved Examples
Example 1:
A manufacturer of small motors is concerned with three major types of defects. If A is

the event that the shaft size is too large, B is the event that the windings are improper and
C is the event that the electrical connections are unsatisfactory, express in words what
events are represented by the following regions of the Venn diagram given below:
(a) region 2 (b) regions 1 and 3 together (c) regions 3, 5, 6 and 8 together.
2
A B
7 2 5
1
4 3
C 6
8
Solution:
(a) Since this region is contained in A and B but not in C, it represents the event that
the shaft is too large and the windings improper but the electrical connections are
satisfactory.
(b) Since this region is common to B and C, it represents the event that the windings
are improper and the electrical connections are unsatisfactory. (c) Since this is the
entire region outside A, it represents the event that the shaft size is not too large.
Example 2:
A carton of 12 rechargeable batteries contain one that is defective. In how many ways can
the inspector choose three of the batteries and
(a) get the one that is defective

(b) not get the one that is defective.
Solution:
(a) one defective can be chosen in one way and two good ones can be chosen in
11
= 55 ways. Hence one defective and two good can be chosen in 1 x 55 = 55
2
ways.
11
(b) Three good ones can be chosen in = 165 ways
3
3
AXIOMS OF PROBABILITY
Let E be a random experiment. Suppose to each event A, we associate a real number

P(A) satisfying the following axioms:
(i) 0 ≤ P ( A) ≤ 1
(ii) P (S ) = 1
(iii) If A and B are any two mutually exclusive events, then
P ( A ∪ B ) = P ( A) + P ( B )
(iv) If {A1, A2 - - - - - -An , …} is a sequence of pair- wise mutually exclusive
events, then P ( A1 ∪ A2 ∪ ... ∪ An ∪ ...) = P ( A1 ) + P ( A2 ) + ... + P ( An ) + ...
We call P(A) the probability of the event A.
Axiom 1 says that the probability of an event is always a number between 0 and 1.
Axiom 2 says that the probability of the certain event S is 1. Axiom 3 says that the
probability is an additive set function.
Some elementary consequences of the Axioms
1. P(φ ) = 0
Proof: S= S ∪ φ .. Now S and φ are disjoint.
Hence P ( S ) = P ( S ) + P (φ ) P (φ ) = 0. Q.E.D.
2. If A1 , A2 ,..., An are any n pair-wise mutually exclusive events, then

n
P ( A1 ∪ A2 ∪ ... ∪ An ) = P ( Ai ) .
i =1
Proof: By induction on n.
Def.: If A is an event
A′ the complementary event = S-A (It is the shaded portion in the figure below)
4
3. P ( A′) = 1 − P ( A)
Proof: S = A ∪ A ′
Now P ( S ) = P ( A) + P ( A′) as A and A ′ are disjoint or 1 = P ( A) + P ( A′) .
Thus P ( A′) = 1 − P ( A) . Q.E.D.
4. Probability is a
subtractive set function; i.e.
B
If A ⊂ B , then A
P ( B − A) = P ( B ) − P ( A) .
5. Probability is a monotone set function:

i.e. A ⊂ B P ( A) ≤ P ( B )
Proof: B = A ∪ (B − A ) where A, B-A are disjoint.
Thus P ( B ) = P ( A) + P ( B − A) ≥ P ( A).
A∩ B
6. If A, B are any two events, A B

P( A ∩ B ) = P( A) + P( B) − P( A ∩ B )
Proof:
( A ∪ B) ) = A ∪ ( A′ ∩ B )
where A and A′ ∩ B are disjoint A′ ∩ B
Hence P( A ∪ B ) = P( A) + P( A′ ∩ B )
But B = ( A ∩ B ) ∪ ( A′ ∩ B ),
union of two disjoint sets
P ( B ) = P ( A ∩ B ) + P ( A′ ∩ B )
or P ( A′ ∩ B ) = P (B ) − P ( A ∩ B ).
∴ P( A ∪ B ) = P( A) + P ( B) − P( A ∩ B ) . Q.E.D.
7. If A, B, C are any three events,
P(A ∪ B ∪ C ) = P(A) + P(B) + P(C) − P(A ∩ B) − P(B ∩ C) − P(C ∩ A) + P(A ∩ B ∩ C) .
5
Proof:
P(A ∪ B ∪ C) = P(A ∪ B) + P(C) − P((A ∪ B) ∩ C )
= P(A) + P(B) − P(A ∩ B) + P(C) − P((A ∪ B) ∩ C)
= P(A) + P(B) + P(C) − P(A ∩ B) − P((A ∩ C) ∪ (B ∩ C))
= P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C)
More generally,
8. If A1 , A2 ,..., An are any n events.

P(A1 ∪ A 2 ∪ ... ∪ A n )
n
= P( A I ) − P(A i ∩ A j ) + P(A i ∩ A j ∩ A k ) − ...
i =1 i ≤1< j≤ n 1≤ i < j≤ k < n
+ (−1) n −1 P(A1 ∩ A 2 ∩ − − − − − − − ∩ A n )
Finite Sample Space (in which all outcomes are equally likely)
Let E be a random experiment having only a finite number of outcomes.

Let all the (finite no. of) outcomes be equally likely.
If S = {a1 , a 2 ,..., a n } ( a1 , a 2 ,..., a n are equally likely outcomes), S = {a 1 } ∪ {a 2 }.......{a n }.a

union of m.e. events.
Hence P ( S ) = P ({a1 }) + P{a 2 } − − − P ({a n })
But P({a1})=P({a2})= …= P({an}) = p (say)
Hence 1 = p+ p+ . . . +p (n terms) or p = 1/n
Hence if A is a subset consisting of ‘k’ of these outcomes,
k No. of favorable outcomes

A ={a1, a2………ak}, then P ( A) = = .
n Total no. of outcomes
6
Example 1:
If a card is drawn from a well-shuffled pack of 52 cards find the probability of drawing
2
(a) a red king Ans:
52
16
(b) a 3, 4, 5 or 6 Ans:
52
1
(c) a black card Ans:
2
4
(d) a red ace or a black queen Ans:
52
Example 2:
When a pair of balanced die is thrown, find probability of getting a sum equal to
(a) 7.
6 1
Ans: = (Total number of equally likely outcomes is
36 6
36 & the favourable number of outcomes = 6, namely
(1,6), (2,5),, …(6,1).)
2
(b) 11 Ans:
36
8
(c) 7 or 11 Ans:
36
1 2 1 4
(d) 2, 3 or 12 Ans: = + + = .
36 36 36 36
Example 3:
10 persons in a room are wearing badges marked 1 through 10. 3 persons are chosen at
random and asked to leave the room simultaneously and their badge nos are noted. Find
the probability that
(a) the smallest badge number is 5.
(b) the largest badge number is 5.
7
Solution:
(a) 3 persons can be chosen in 10C3 equally likely ways. If the smallest badge
number is to be 5, the badge numbers should be 5 and any two of the 5
numbers 6, 7, 8, 9,10. Now 2 numbers out of 5 can be chosen in 5C2 ways.
Hence the probability that the smallest badge number is 5 is 5C2 /10C3 .
(b) Ans. 4C2 /10C3 .
Example 4:
A lot consists of 10 good articles, 4 articles with minor defects and 2 with major defects.
Two articles are chosen at random. Find the probability that
10
C2
(a) both are good Ans: 16
C2
2
C2
(b) both have major defects Ans: 16
C2
6c1
(c) At least one is good Ans: 1 – P(none is good) = 1 −
16c 2
10c1 . 6c1
(d) Exactly one is good Ans:
16c 2
(e) At most one is good Ans. P(none is good) + P(exactly one is good) =
6c 2 10c1 . 6c1
+
16c 2 16c 2
14c 2
(f) Neither has major defects Ans:
16c 2
6c 2
(g) Neither is good Ans:
16c 2
8
Example 5:
From 6 positive and 8 negative integers, 4 integers are chosen at random and multiplied.
Find the probability that their product is positive.
Solution:
The product is positive if all the 4 integers are positive or all of them are negative or two
of them are positive and the other two are negative. Hence the probability is
6 8 6 8
4 4 2 2
+ +
14 14 14
4 4 4
Example 6:
If, A, B are mutually exclusive events and if P(A) = 0.29, P(B) = 0.43, then
(a) P(A ′) = 1 − 0.29 = 0.71

(b) P(A∪B) = 0.29 + 0.43 = 0.72
(c) P ( A ∩ B ′) = P(A) = 0.29 [ (as A is a subset of B ′, since A and B are m.e.)
(d) P(A ′ ∩ B′) = 1 − P(A ∪ B) = 1 − 0.72 = 0.28
Example 7:
P(A) = 0.35, P(B) = 0.73, P (A ∩ B) = 0.14 . Find
(a) P (A ∪ B) = P(A) + P(B) - P( A ∩ B) = 0.94.

(b) P (A ′ ∩ B) = P(B) − P(A ∩ B) = 0.59
(c) P (A ∩ B′) = P(A) − P(A ∩ B) = 0.21
(d) P(A ′ ∪ B′) = 1 − P(A ∩ B) = 1 − 0.14 = 0.86
Example 8:
A, B, C are 3 mutually exclusive events. Is this assignment of probabilities possible?
P(A) = 0.3, P(B) = 0.4, P(C) = 0.5

9
Ans. P(A ∪ B ∪ C) = P(A) + P(B) + P(C) >1 NOT POSSIBLE
Example 9:
Three newspapers are published in a city. A recent survey of readers indicated the
following:
20% read A 8% read A and B 2% read all

16% read B 5% read A and C
14% read C 4% read B and C
Find probability that an adult chosen at random reads

(a) none of the papers.
20 + 16 + 14 + −8 − 5 − 4 + 2
Ans. 1 − P(A ∪ B ∪ C) = 1 − = 0.65
100
(b) reads exactly one paper. A B

P (Reading exactly one paper)
9 6 6
9+6+7 3 2 2
= = 0.22
100
7 C
(c) reads at least A and B given he reads at least one of the papers.
P (At least reading A and B given he reads at least one of the papers)
P(A ∩ B) 8
= =
P(A ∪ B ∪ C) 35
10
CONDITIONAL PROBABILITY
Let, A, B be two events. Suppose P(B) ≠ 0. The conditional probability of A occurring

given that B has occurred is defined as
P(A ∩ B)
P(A | B) = probability of A given B = .
P(B)
P(A ∩ B)
Similarly we define P(B | A) = if P(A) ≠ 0.
P(A)
Hence we get the multiplication theorem
P(A ∩ B) = P(A).P(B/A) (if P(A) ≠ 0) )
= P(B).P(A/B) (if P(B) ≠ 0)
Example 10
A bag contains 4 red balls and 6 black balls. 2 balls are chosen at random one by one
without replacement. Find the probability that both are red.
Solution
Let A be the event that the first ball drawn is red, B the event the second ball drawn is
red. Hence the probability that both balls drawn are red =
4 3 2
P(A ∩ B) = P(A) × P(B | A) = × =
10 9 15
Independent events:
Definition: We say two events A, B are independent if P(A ∩ B) = P(A). P(B)
Equivalently A and B are independent if P(B | A) = P(B) or P(A | B) = P(A)

Theorem If, A, B are independent, then
(a) A ′ , B are independent
(b) A, B′ are independent
(c) A ′, B′ are independent
11
Proof B = (A ∩ B) ∪ (A ′ ∩ B)
A B
A∩B
A′ ∩ B
Mutually
exclusive
P(B) = P(A ∩ B) + P(A ′ ∩ B)
P(A ′ ∩ B) = P(B) - P(A ∩ B)

= P(B) – P(A) (P/B)
= P(B) [1-P(A)]
= P(B) P( A ′)
∴A, B′ are also independent.

By the same reasoning, A′ and B are independent.
So again A′ and B′ are independent.
Example 11
Find the probability of getting 8 heads in a row in 8 tosses of a fair coin.
Solution
If Ai is the event of getting a head in the ith toss, A1, A2, …, A8 are independent and
1
P(Ai) = for all i. Hence P(getting all heads) =
2
8
1
P(A1) P(A2)…P(An) =
2
Example 12
It is found that in manufacturing a certain article, defects of one type occur with
probability 0.1 and defects of other type occur with probability 0.05. Assume
independence between the two types of defects. Find the probability that an article chosen
at random has exactly one type of defect given that it is defective.
12
Let A be the event that article has exactly one type of defect.
Let B be the event that the article is defective.
P(A ∩ B)
Required P(A | B) =
P(B)
P(B) = P(D ∪ E) where D is the event it has type one defect

E is the event it has type two defect
= P(D) + P(E) – P(D ∩ E) = 0.1 + 0.05 - (0.1) (0.05) = 0.145
P(A ∩ B) = P (article is having exactly one type of defect)

= P(D) + P(E) – 2 P(D ∩ E) = 0.1 + 0.05 - 2 (0.1) (0.05)
= 0.14
0.14
∴Probability =
0.145
[Note: If A and B are two events, probability that exactly only one of them occurs
is P(A) + P(B) – 2P(A ∩ B)]
Example 13
An electronic system has 2 subsystems A and B. It is known that
P (A fails) = 0.2
P (B fails alone) = 0.15
P (A and B fail) = 0.15
Find (a) P (A fails | B has failed)

(b) P (A fails alone)
13
Solution
P(A and B failed) 0.15 1
(a) P(A fails | B has failed) = = =
P(B failed) 0.30 2
(b) P (A fails alone) = P (A fails) – P (A and B fail) = 0.02-0.15 = 0.05
Example 14
A binary number is a number having digits 0 and 1. Suppose a binary number is made up
of ‘n’ digits. Suppose the probability of forming an incorrect binary digit is p. Assume
independence between errors. What is the probability of forming an incorrect binary
number?
Ans 1- P (forming a correct no.) = 1 – (1-p)n .
Example 15
A question paper consists of 5 Multiple choice questions each of which has 4 choices (of
which only one is correct). If a student answers all the five questions randomly, find the
probability that he answers all questions correctly.
5
1
Ans .
4
Theorem on Total Probability
Let B1, B2, …, Bn be n mutually exclusive events of which one must occur. If A is any
other event, then
P(A) = P(A ∩ B1 ) + P(A ∩ B 2 ). + ..... + P(A ∩ B n )

n
= P(Bi ) P(A | Bi )
i=1
(For a proof, see your text book.)
Example 16
There are 2 urns. The first one has 4 red balls and 6 black balls. The second has 5 red
balls and 4 black balls. A ball is chosen at random from the 1st and put in the 2nd. Now a
ball is drawn at random from the 2nd urn. Find the probability it is red.
14
Solution:
Let B1 be the event that the first ball drawn is red and B2 be the event that the first ball
drawn is black. Let A be the event that the second ball drawn is red. By the theorem on
total probability,
4 6 6 5 54
P(A) = P(B1) P(A | B1) + P(B2) P(A | B2) = × + × = =0.54.
10 10 10 10 100
Example 17:
A consulting firm rents cars from three agencies D, E, F. 20% of the cars are rented from
D, 20% from E and the remaining 60% from F. If 10% of cars rented from D, 12% of
cars rented from E, 4% of cars rented from F have bad tires, find the probability that a
car rented from the consulting firm will have bad tires.
Ans. (0.2) (0.1) + (0.2) (0.12) + (0.6) (0.04)
Example 18:
A bolt factory has three divisions B1, B2, B3 that manufacture bolts. 25% of output is
from B1, 35% from B2 and 40% from B3. 5% of the bolts manufactured by B1 are
defective, 4% of the bolts manufactured by B2 are defective and 2% of the bolts
manufactured by B3 are defective. Find the probability that a bolt chosen at random from
the factory is defective.
25 5 35 4 40 2
Ans. × + × + ×
100 100 100 100 100 100
15
BAYES’ THEOREM
Let B1, B2, ……….Bn be n mutually exclusive events of which one of them must occur.
If A is any event, then
P(A ∩ B k ) P(B )P(A | B k )

P(B k | A) = = n k
P(A) P(B i )P(A | B i )
i =1
Example 19
Miss ‘X’ is fond of seeing films. The probability that she sees a film on the day before
the test is 0.7. Miss X is any way good at studies. The probability that she maxes the test
is 0.3 if she sees the film on the day before the test and the corresponding probability is
0.8 if she does not see the film. If Miss ‘X’ maxed the test, find the probability that she
saw the film on the day before the test.
Solution
Let B1 be the event that Miss X saw the film before the test and let B2 be the
complementary event. Let A be the event that she maxed the test.
Required. P(B1 | A)
P(B1 )P(A | B1 )
=
P(B1 ) × P(A | B1 ) + P(B) × P(A | B 2 )
0 .7 × 0 .3
=
0 . 7 × 0 . 3 + 0 . 3 × 0 .8
Example 20
At an electronics firm, it is known from past experience that the probability a new worker
who attended the company’s training program meets the production quota is 0.86. The
corresponding probability for a new worker who did not attend the training program is
0.35. It is also known that 80% of all new workers attend the company’s training
16
program. Find probability that a new worker who met the production quota would have
attended the company’s training programme.
Solution
Let B1 be the event that a new worker attended the company’s training programme. Let
B2 be the complementary event, namely a new worker did not attend the training
programme. Let A be the event that a new worker met the production quota. Then we
0 .8 × 0 .8
want P(B1 | A) = .
0.8 × 0.86 + 0.2 × 0.35
Example 21
A printing machine can print any one of n letters L1, L2,……….Ln. It is operated by
electrical impulses, each letter being produced by a different impulse. Assume that there
is a constant probability p that any impulse prints the letter it is meant to print. Also
assume independence. One of the impulses is chosen at random and fed into the machine
twice. Both times, the letter L1 was printed. Find the probability that the impulse chosen
was meant to print the letter L1.
Solution:
Let B1 be the event that the impulse chosen was meant to print the letter L1. Let B2 be the
complementary event. Let A be the event that both the times the letter L1 was printed.
1
P(B1) = . P(A|B1) = p2. Now the probability that an impulse prints a wrong letter is (1-
n
1− p
p). Since there are n-1 ways of printing a wrong letter, P(A|B2) = . Hence P(B1|A)
n −1
P(B1 ) × P(A | B1 )
=
P(B1 ) × P(A | B1 ) + P(B 2 ) × P(A | B 2 )
1 2
p
n
= 2
. This is the required probability.
1 2 1 1− p
p + 1−
n n n −1
17
Miscellaneous problems
1 (a). Suppose the digits 1,2,3 are written in a random order. Find probability that at
least one digit occupies its proper place.
Solution
There are 3! = 6 ways of arranging 3 digits (See the figure), out of which in 4
arrangements , at least one digit occupies its proper place. Hence the probability is
4 4
= . 123 213 312
3! 6
132 231 321
(Remark. An arrangement like 231, where no digit occupies its proper place is
called a derangement.)
15
(b) Same as (a) but with 4 digits 1,2,3,4 Ans. (Try proving this.)
24
Solution
Let A1 be the Event 1st digit occupies its proper place
A2 be the Event 2nd digit occupies its proper place
A3 be the Event 3rd digit occupies its proper place
A4 be the Event 4th digit occupies its proper place
P(at least one digit occupies its proper place)

=P(A1∪A2 ∪A3 ∪A4)
=P(A1) + P(A2) + P(A3) + P(A4)
(There are 4C1 terms each with the same probability)
− P(A 1 ∩ A 2 ) − P(A 1 ∩ A 3 ) − P(A 1 ∩ A 4 ) − ... − P( A 3 ∩ A 4 )
+ P(A 1 ∩ A 2 ∩ A 3 ). + P(A 1 ∩ A 2 ∩ A 4 ) + ... + P(A 2 ∩ A 3 ∩ A 4 )

- P( A 1 ∩ A 2 ∩ A 3 ∩ A 4 )
3! 2! 1! 0!
= 4c1 − 4c 2 + 4c 3 − 4c 4
4! 4! 4! 4!
18
1 1 1
= 1− + −
2 6 24
24 − 12 + 4 − 1 15
= =
24 24
(c) Same as (a) but with n digits.
Solution
Let A1 be the Event 1st digit occupies its proper place
A2 be the Event 2nd digit occupies its proper place
……………………
An be the Event nth digit occupies its proper place
P(at least one digit occupies its proper place)
= P(A1∪A2 ∪ … ∪An)
(n − 1)! (n − 2)! (n − 3)! 1
= nc1 − nc 2 + nc 3 - ...... + (-1) n -1
n! n! n! n!
1 1 1 1
= 1− + − ..........(−1) n −1 ≈ 1 − e −! (for n large).
2! 3! 4! n!
2. In a party there are ‘n’ married couples. If each male chooses at random a
female for dancing, find the probability that no man chooses his wife.
1 1 1 1
Ans 1-( 1 − + − ..........(−1) n −1 ).
2! 3! 4! n!
3. A and B play the following game. They throw alternatively a pair of dice.
Whosoever gets sum of the two numbers on the top as seven wins the game
and the game stops. Suppose A starts the game. Find the probability (a) A
wins the game (b) B wins the game.
19
Solution
A wins the game if he gets seven in the 1st throw or in the 3rd throw or in the
1 5 5 1 5 5 5 5 1
5th throw or …. Hence P(A wins) = + × × + × × × × + …
6 6 6 6 6 6 6 6 6
1 1
6 6 6 5
= = = . P(B wins) = complementary probability = .
5
2
36 − 25 11 11
1− 36
6
4. Birthday Problem
There are n persons in a room. Assume that nobody is born on 29th Feb.
Assume that any one birthday is as likely as any other birth day. Find the
probability that no two persons will have same birthday.
Solution
If n > 365, at least two will have the same birthday and hence the probability
that no two will have the same birthday is 0.
365 × 364 × .........[365 − (n − 1)]

If n ≤ 365, the desired probability is = .
(365) n
5. A die is rolled until all the faces have appeared on top.
(a) What is probability that exactly 6 throws are needed?
6!
Ans.
66
(b) What is probability that exactly ‘n’ throws are needed? (n > 6)
20
6. Polya’s urn problem
An urn contains g green balls and r red balls. A ball is chosen at random and
its color is noted. Then the ball is returned to the urn and c more balls of same
color are added. Now a ball is drawn. Its color is noted and the ball is
replaced. This process is repeated.
(a) Find probability that 1st ball drawn is green.
g
Ans.
g+r
(b) Find the probability that the 2nd ball drawn is green.
g g+c r g g
Ans. × + =
g +r g +r+c g +r g+r+c g+r
(c) Find the probability that the nth ball drawn is green.
g
The surprising answer is .
g+r
7. There are n urns and each urn contains a white and b red balls. A ball is
chosen from Urn 1 and put into Urn 2. Now a ball is chosen at random from
urn 2 and put into urn 3 and this is continued. Finally a ball drawn from Urn n.
Find the probability that it is white.
Solution
Let pr = Probability that the ball drawn from Urn r is white.
a +1 a
∴ p r = p r −1 × + (1 − p r −1 ) × ; r = 1, 2, …, n.
a + b +1 a + a +1
a
This is a recurrence relation for pr. Noting that p1 = , we can find pn.
a+b
21
MATHEMATICAL EXPECTATION & DECISION MAKING
Suppose we roll a die n times. What is the average of the n numbers that appear on the
top?
Suppose 1 occurs on the top n1 times

Total of the n numbers on the top = 1 × n 1 + 1 × n 2 + ..............6 × n 6
∴Average of the n numbers,
1 × n 1 + 2 × n 2 ..........6 × n 6 n n n
= = 1 × 1 + 2 × 2 + ... + 6 × 6
n n n n
Here clearly n1, n2, …, n6 are unknown. But by the relative frequency definition of
n 1 n
probability, we may approximate 1 by P(getting 1 on the top) = , 2 by
n 6 n
1
P(getting 2 on the top) = , and so on. So we can ‘expect’ the average of the n
6
7
numbers to be = 3.5 . We call this the Mathematical Expectation of the number
2
on the top.
Definition
Let E be a random experiment with n outcomes a1, a2 ……….an. Suppose P({a1})=p1,

P({a2})=p2, …, P({an})=pn. Then we define the mathematical expectation as
a 1 × p1 + a 2 × p 2 ......... + a n × p n
22
Problems
1. If a service club sells 4000 raffle tickets for a cash prize of $800, what is the
mathematical expectation of a person who buys one of these tickets?
1 1
Solution. 800 × + 0 × ( ) = = 0 .2
4000 5
2. A charitable organization raises funds by selling 2000 raffle tickets for a 1st prize
worth $5000 and a second prize $100. What is mathematical expectation of a
person who buys one of the tickets?
1 1
Solution. 5000 × + 100 × + 0× ( )
2000 2000
3. A game between 2 players is called fair if each player has the same mathematical
expectation. If some one gives us $5 whenever we roll a 1 or a 2 with a balanced
die, what we must pay him when we roll a 3, 4, 5 or 6 to make the game fair?
Solution. If we pay $x when we roll a 3, 4, 5, or 6 for the game to be fair,

4 2
x × = 5 × or x = 10. That is we must pay $10.
6 6
4. Gambler’s Ruin
A and B are betting on repeated flips of a balanced coin. At the beginning, A has
m dollars and B has n dollars. After each flip the loser pays the winner 1 dollar
and the game stops when one of them is ruined. Find probability that A will win
B’s n dollars before he loses his m dollars.
Solution.
Let p be the probability that A wins (so that 1-p is the probability that B wins).
Since the game is fair, A’s math exp = B’s math exp.
m
Thus n × p + 0 (1 − p ) = m(1 − p) + 0.p or p =
m+n
23
5. An importer is offered a shipment of machines for $140,000. The probability that
he will sell them for $180,000, $170,000 (or) $150,000 are respectively 0.32,
0.55, and 0.13. What is his expected profit?
Solution. Expected profit
= 40,000 × 0.32 + 30,000 × 0.55 + 10,000 × 0.13

=$30,600
6. The manufacturer of a new battery additive has to decide whether to sell her
product for $80 a can and for $1.2 a can with a ‘double your money back if not
satisfied’ guarantee. How does she feel about the chances that a person will ask
for double his/her money back if
(a) she decides to sell the product for $0.80

(b) she decides to sell the product for $1.20
(c) she can not make up her mind?
Solution. In the 1st case, she gets a fixed amount of $0.80 a can
In the 2nd case, she expects to get for each can
(1.20) (1-p) + (-1.2) (p) = 1.20 – (2.4) p
Let p be the prob that a person will ask for double his money back.
(a) happens if 0.80 > 1.20 –2.40 p
p > 1/6
(b) happens if
p < 1/6
(c) happens if p = 1/6
24
7. A manufacturer buys an item for $1.20 and sells it for $4.50. The probabilities for
a demand of 0, 1, 2, 3, 4, “5 or more” items are 0.05, 0.15, 0.30, 0.25, 0.15, 0.10
respectively. How many items he must stock to maximize his expected profit?
No. of items stocked No. sold with prob. Exp. profit

0 0 1 0
0 × 0.05 + 4.5
0 0.05
1 × 0.95 − 2.1
1 0.95
= 2.175
0 0.05 0 × 0.05 + 4.5 × 0.15
2 1 0.15 + 9 × 0.80 − 4.2
2 0.80 = 3.675
0 0.05 0 × 0.05 + 4.5 × 0.15
1 0.15 + 9 × 0.30 + 13.5
3
2 0.30 × 0.15 − 6.3
3 0.50 =
4 2.85
5 0.525
6 0.45
Hence he must stock 3 items to maximize his expected profit.

8. A contractor has to choose between 2 jobs. The 1st job promises a profit of
$240,000 with probability 0.75 and a loss of $60,000 with probability 0.25. The
2nd job promises a profit of $360,000 with probability 0.5 and a loss of $90,000
with probability 0.5.
(a) Which job should the contractor choose to maximize his expected profit?
3 1
i. Exp. profit for job1 = 240,000 × − 60,000 × = 155,000
4 4
1 1
ii. Exp. profit for job2 = 36,000 × − 90,000 × = 135,000
2 2
Go in for job1.
(b) What job would the contractor probably choose if her business is in bad
shape and she goes broke unless, she makes a profit of $300,000 on her
next job.
Ans:- She takes the job2 as it gives her higher profit.
25
RANDOM VARIABLES
Let E be a random experiment. A random variable (r.v) X is a function that associates to

each outcome s, a unique real number X (s).
Example 1
Let E be the random experiment of tossing a fair coin 3 times. We see that there are
2 3 = 8 outcomes TTT, HTT, THT, TTH, HHT, HTH, THH, HHH all of which are
equally likely. Let X be the random variable that ‘counts’ the number of heads obtained.
Thus X can take only 4 values 0,1,2,3. We note that
1 3 3 1
P ( X = 0 ) = , P ( X = 1) = , P ( X = 2 ) = , P ( X = 3) = . This is called the
8 8 8 8
probability distribution of the rv X. Thus the probability distribution of a rv X is the
listing of the probabilities with which X takes all its values.
Example 2
Let E be the random experiment of rolling a pair of balanced die. There are 36 possible
equally likely outcomes, namely (1,1), (1,2)…… (6,6). Let X be the rv that gives the sum
of the two nos on the top. Hence X take 11 values namely 2,3……12. We note that the
probability distribution of X is
1 2
P(X = 2 ) = P(X = 12 ) = , P(X = 3) = P(X = 11) = ,
36 36
3
P(X = 4 ) = P(X = 10 ) = ,
36
4
P(X = 5) = P(X = 9 ) = .
36
5 6 1
P(X = 6 ) = P(X = 8) = , P(X = 7 ) = = .
36 36 6
Example 3
Let E be the random experiment of rolling a die till a 6 appears on the top. Let X be the
no of rolls needed to get the “first” six. Thus X can take values 1,2,3…… Here X takes
an infinite number of values. So it is not possible to list all the probabilities with which X
takes its values. But we can give a formula.
26
x −1
5 1
P( X = x ) = (x = 1,2.....)
6 6
(Justification: X = x means the first (x-1) rolls gave a number (other than 6) and
x −1
5 5 5 1 5 1
the xth roll gave the first 6. Hence P ( X = x ) = × ...× × = )
6 6 6 6 6 6
x −1 times
Discrete Random Variables
We say X is a discrete rv of it can take only a finite number of values (as in example 1,2
above) or a “countably” infinite values (as in example 3).
On the other hand, the annual rainfall in a city, the lifelength of an electronic device, the
diameter of washers produced by a factory are all continuous random variables in the
sense they can take (theoretically at least) all values in an ‘interval’ of the x-axis. We
shall discuss continuous rvs a little later.
Probability distribution of a Discrete RV
Let X be a discrete rv with values x1 , x 2 ......

Let f (x i ) = P(X = x i )(i = 1,2.....)
We say that {f (x i )}i =1, 2.... is the probability distribution of the rv X.
Properties of the probability distribution
(i) f (x i ) ≥ 0 for all i = 1,2.....

(ii) f (x i ) = 1
i
The first condition follows from the fact that the probability is always ≥ 0. The second
condition follows from the fact that the probability of the certain event = 1.
27
Example 4
Determine whether the following can be the probability distribution of a rv which can
take only 4 values 1,2,3 and 4.
(a) f (1) = 0.26 f (2) = 0.26 f (3) = 0.26 f (4) = 0.26 .

No as the sum of all the “probabilities” > 1.
(b) f (1) = 0.15 f (2) = 0.28, f (3) = 0.29 f (4 ) = 0.28 .
Yes as these are all ≥ 0 and add up to 1.
x +1
(c) f (x ) = x = 1,2,3,4 .
16
No as the sum of all the probabilities < 1.
Binomial Distribution
Let E be a random experiment having only 2 outcomes, say ‘success’ and ‘failure’.
Suppose that P(success) = p and so P(failure) = q (=1-p). Consider n independent
repetitions of E (This means the outcome in any one repetition is not dependent upon the
outcome in any other repetition). We also make the important assumption that P(success)
= p remains the same for all such independent repetitions of E. Let X be the rv that
’counts’ the number of successes obtained in n such independent repetitions of E. Clearly
X is a discrete rv that can take n+1 values namely 0,1,2,….n. We note that there are
2 n outcomes each of which is a ‘string’ of n letters each of which is an S or F (if n =3, it
will be FFF, SFF, FSF, FFS, SSF, SFS, FSS, SSS).
X = x means in any such outcome there are
x successes and (n-x) failures in some order. One such will be SSS ..S FFF ..F . Since all
x n− x
the repetitions are independent prob of this outcome will be p x q n − x . Exactly the same
prob would be associated with any other outcome for which X = x. But x successes can
n
occur out of n repetitions in mutually exclusive ways. Hence
x
n
P(X = x ) = p x q n − x (x = 0,1, ...n ).
x
28
We say X has a Binomial distribution with parameters n ( ≡ the number of repetitions)
and p (Prob of success in any one repetition).
We denote P(X = x ) by b(x; n , p ) to show its dependence on x, n and p. The letter ‘b’
stands for binomial.
Since all the above (n+1) probabilities are the (n+1) terms in the expansion of the
binomial (q + p ) , X is said to have a binomial distribution. We at once see that the sum
n
of all the binomial probabilities = (q + p ) = 1n = 1.

n
The independent repetitions are usually referred to as the “Bernoulli” trials. We note that
b(x; n, p ) = b(n − x; n, q )
(LHS = Prob of getting x successes in n Bernoulli trials = prob of getting n-x failures in
n Bernoulli trials = R.H.S.)
Cumulative Binomial Probabilities
Let X have a binomial distribution with parameters n and p.
P(X ≤ x ) = P(X = 0) + P (X = 1) + ...... P(X = x )

x
= b(k; n , p )
k =0
is denoted by B( x; n, p ) and is called the cumulative Binomial distribution function. This

is tabulated in Table 1 of your text book. We note that
b(x; n , p ) = p(X = x ) = P(X ≤ x ) − P(X ≤ x − 1)

= B(x; n , p ) − B(x − 1; n , p )
Thus b(9;12,00.60) = B(9;12,0.60) − B(8;12,0.60)

= 0.9166 − 0.7747
= 0.1419
(You can verify this by directly calculating b(9;12,0.60)).
29
Example 5 (Exercise 4.15 of your book)
During one stage in the manufacture of integrated circuit chips, a coating must be
applied. If 70% of the chips receive a thick enough coating find the probability that
among 15 chips.
(a) At least 12 will have thick enough coatings.

(b) At most 6 will have thick enough coatings.
(c) Exactly 10 will have thick enough coatings.
Solution
Among 15 chips, let X be the number of chips that will have thick enough coatings.
Hence X is a rv having Binomial distribution with parameters n =15 and p = 0.70.
(a) P(X ≥ 12) = 1 − P(X ≤ 11)

= 1 − B (11;15,0.70 )
= 1 − 0.7031 = 0.3969
(b) P(X ≤ 6) = B(6;15,0.70 )
= 0.0152
(c) P(X = 10) = B(10;15,0.70) − B(9;15,0.70)
= 0.4849 − 0.2784
= 0.2065
Example 6 (Exercise 4.19 of your text book)
A food processor claims that at most 10% of her jars of instant coffee contain less coffee
than printed on the label. To test this claim, 16 jars are randomly selected and contents
weighed. Her claim is accepted if fewer than 3 of the 16 jars contain less coffee (note that
10% of 16 = 1.6 and rounds to 2). Find the probability that the food processor’s claim
will be accepted if the actual percent of the jars containing less coffee is
(a) 5% (b) 10% (c) 15% (d) 20%
Solution:
Let X be the number of jars that contain less coffee (than printed on the label) (among the
16 jars randomly chosen. Thus X is a random variable having a Binomial distribution
30
with parameters n = 16 and p (the prob of “success” = The prob that a jar chosen at
random will have less coffee)
(a) Here p = 5% = 0.05

Hence P (claim is accepted) = P(X ≤ 2) = B(2;16,0.05) = 0.9571.
(b) Here p = 10% = 0.10

Hence P (claim is accepted) = B(2;16,0.01) = 0.7892
(c) Here p = 15% = 0.15.

Hence P (claim is accepted) = B (2;16,0.15) = 0.5614
(d) Here p = 20% = 0.20

Hence P(claims accepted) = B(2,16,0.29) = 0.3518
Binomial Distribution – Sampling with replacement
Suppose there is an urn containing 10 marbles of which 4 are white and the rest are black.
Suppose 5 marbles are chosen with replacement. Let X be the rv that counts the no of
white marbles drawn. Thus X = 0,1,2,3,4 or 5 (Remember that we replace each marble in
the urn before drawing the next one. Hence we can draw 5 white marbles)
4
P (“Success”) = P (Drawing a white marble in any one of the 5 draws) = (remember
10
we draw with replacement).
4
Thus X has a Binomial distribution with parameters n = 5 and p =
10
4
Hence P ( X = x ) = b x;5,
10
Mode of a Binomial distribution
We say x0 is the mode of the Binomial distribution with parameters n and p if

P ( X = x0 ) is the greatest. From the binomial tables given in the book we can easily see
that
31
1
When n = 10, p = , P ( X = 5) is the greatest or 5 is the mod e.
2
Fact
b( x + 1; n, p ) n − x p
= × > 1if x < np − (1 − p )
b( x; n; p ) n +1 1− p
= 1 if x = np − (1 − p )
<1if n > n p − (1 − p )
Thus so long as x <np – (1-p) the binomial probabilities increase and if x> np-(1-p) they
decrease. Hence if np-(1-p) = x0 is an integer, then the mode is x0 and x0 + 1. If n – (1-p)
in not an integer and if x0 = smallest integer ≥ np − (1 − p ) , the mode is x 0 .
Hypergeometric Distribution (Sampling without replacement)
An urn contains 10 marbles of which 4 are white. 5 marbles are chosen at random
without replacement. Let X be the rv that counts the number of white marbles drawn.
Thus X can take 5 values names 0,1,2,3,4. What is P (X = x)? Now out of 10 marbles 5
10 4 6
can be chosen in equally like ways, out of which there will be ways of
5 x 5− x
drawing x white marbles (and so 5-x read marbles) (Reason out of 4 white marbles, x can
4 6
be chosen in ways and out of 6 red marbles, 5-x can be chosen in ways).
x 5− x
4 6
x 5− x
Hence P ( X = x ) = x = 0,1,2,3,4.
10
5
We generalize the above result.
A box contains N marbles out of which a are white. n marbles are chosen without
replacement. Let X be the random variable that counts the number of white marbles
drawn. X can take the values 0,1,2……. n.
32
a N −a
x n−x
P( X = a ) = x = 0,1,2.... n
N
n
(Note x must be less than or equal to a and n-x must be less than or equal to N-a)
We say the rv X has a hypergeometric distribution with parameters n,a and N. We denote
P(X=x) by h (x;n,a,N).
Among the 12 solar collectors on display, 9 are flat plate collectors and the other three
are concentrating collectors. If a person choses at random 4 collectors, find the prob that
3 are flat plate ones.
9 3
3 1
Ans h (3; 4, 9,12 ) =
12
4
If 6 of 18 new buildings in a city violate the building code, what is the probability that a
building inspector, who randomly selects 4 of the new buildings for inspection, will catch
(a) None of the new buildings that violate the building code
12
4
Ans h(1; 4, 6, 18) =
18
4
(b) One of the new buildings that violate the building code
33
6 12
1 3
Ans h(1; 4, 6,18) =
18
4
(c) Two of the new buildings that violate the building code
6 12
2 2
Ans h(2; 4, 6, 18) =
18
4
(d) At least three of the new buildings that violate the building code
Ans h(3; 4, 6, 18) + h (4; 4, 6, 18)
(Note: We choose 4 buildings out of 18 without replacement. Hence hypergeometric

distribution is appropriate)
Binomial distribution as an approximation to the Hypergeometric Distribution
We can show that h( x; n, a, N ) → b( x; n, p ) as N → ∞

a
(Where p = = " prob of a success" ) . Hence if N is large the hypergeometric
N
probability h (x; n, a , N ) can be approximated by the binomial probability
a
b(x; n, p ) where p = .
N
Example 9 (exercise 4.26 of your text)
A shipment of 120 burglar alarms contains 5 that are defective. If 3 of these alarms are
randomly selected and shipped to a customer, find the probability that the customer will
get one defective alarm.
(a) By using the hypergemetric distribution

(b) By approximating the hypergeometric probability by a binomial probability.
34
Solution
Here N = 120 (Large!) a = 5 n = 3 x =1
(a) Reqd prob = h(1; 3, 5,120)
5 115
1 2 5 × 6555
= = = 0.1167
120 280840
3
5
(b) h(1; 3, 5, 120 ) ≈ b 1; 3,
120
2
3 5 5
= 1− = 0.1148
1 120 120
Example 10 (Exercise 4.27 of your text)
Among the 300 employees of a company, 240 are union members, while the others are
not. If 8 of the employees are chosen by lot to serve on the committee which
administrates the provident fund, find the prob that 5 of them will be union members
while the others are not.
(a) Using hypergemoretric distribution

(b) Using binomial approximation
Solution
Here N = 300, a = 240, n = 8 x = 5
(a) h (5; 8, 240, 300)

240
(b) ≈ b 5; 8,
300
35
THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS
We know that the equation of a line can be written as y = mx + c. Here m is the slope and
c is the y intercept. Different m,c give different lines. Thus m and c characterize a line.
Similarly we define certain numbers that characterize a probability distribution.
The mean of a probability distribution is simply the mathematical expectation of the

corresponding r.v. If a rv X takes on the values x 1, x 2 ..... with probabilities
f (x 1 ), f (x 2 )...., its mathematical expectation or expected value is
x1 f ( x1 ) + x 2 f ( x 2 ) + ...... = xi x P ( x = xi ) = value × Pr obability

i
We use the symbol µ to denote the mean of X.
Thus µ = E ( X ) = xi P( x = xi ) (Summation over all xi in the Range of X)
Example 11
Suppose X is a rv having the probability distribution
X 1 2 3
Prob 1 1 1
2 3 6
Hence the mean µ of the prob distribution (of X) is
1 1 1 5
µ =1 × + 2× + 3× =
2 3 6 3
Example 12
Let X be the rv having the distribution
X 0 1
Prob q p
36
where q = 1 − p. Thus µ = 0 × q + 1 × p = p.
The mean of a Binomial Distribution
Suppose X is a rv having Binomial distribution with parameters n and p. Then
Mean of X = µ = np.
(Read the proof on pages 107-108 of your text book)
The mean of a hypergeometric Distribution
a
If X is a rv having hypergeometric distribution with parameters N , n, a, then µ = n .
N
Digression
The mean of a rv x give the “average” of the values taken by the rv. X. Thus the
average marks in a test is 40 means the students would have got marks less than 40
and greater than 40 but it averages out to be 40. But we do not get an idea about the
spread ( ≡ deviation from the mean) of the marks. This spread is measured by the
variance. Informally speaking by the average of the squares of deviation from the
mean.
Variance of a Probability Distribution of X is defined as the expected value of

(X − µ )2
Variance of X = σ 2
= (x i − )2 P(X = x i )
xi ∈ R X
Note that R.H.S is always ≥ 0 (as it is the sum of non-ve numbers)
The positive square root σ of σ 2 is called the standard deviation of X and has the
same units as X and µ .
37
Example 13
For the rv X having the prob distribution given in example 11, the variance is
2 2 2
5 1 5 1 5 1
1− × + 2− × + 3− ×
3 2 3 3 3 6
4 1 1 1 16 1 5
= x + × + × =
9 2 9 3 9 6 9
We could have also used the equivalent formula
( )
σ 2 = E (X − µ ) = E X 2 − µ 2
2
( ) 1
Here E X 2 = 12 ×
2
1 1 1 4 9 60 10
+ 2 2 × + 32 × = + + = =
3 6 2 3 6 18 3
10 25 5
∴σ2 = − = .
3 9 9
Example 14
For the probability distribution of example 12,
( )
E X 2 = o 2 × q + 12 × p = p
∴σ 2 = p − p 2 = p(1 − p ) = pq
Variance of the Binomial Distribution
σ 2 = npq
Variance of the Hypergeometric Distribution
a a N −n
σ2 =n 1− . .
N N N −1
38
CHEBYCHEV’S THEOREM
Suppose X is a rv with mean µ and variance σ 2 . Chebychev’s theorem states that: If k

is a constant > 0,
1
P(| X − µ | ≥ kσ ) ≤
k2
In words the prob of getting a value which deviates from its mean µ by at least kσ is at
1
most .
k2
Note: Chebyshev’s Theorem gives us an upper bound of the prob of an event. Mostly it is
of theoretical interest.
In one out of 6 cases, material for bullet proof vests fails to meet puncture standards. If
405 specimens are tested, what does Chebyshev theorem tell us about the prob of getting
at most 30 or at least 105 cases that do not meet puncture standards?
1 135
Here µ = np = 405 × =
6 2
1 5
σ 2 = n p q = 405 × ×
6 6
15
∴σ =
2
Let X = no of cases out of 405 that do not meet puncture standards
Reqd P(X ≤ 30 or X ≥ 105)
75
Now X ≤ 30 X − µ ≤−
2
75
X ≥ 105 X −µ ≥
2
75
Thus X ≤ 30 or X ≥ 105 | X −µ |≥ = 5σ
2
39
1 1
∴P(X ≤ 30 or X ≥ 105) = P(| X − µ | ≥ 5σ ) ≤ = = 0.04
5 2 25
Example 16 (Exercise 446 of your text)
How many times do we have to flip a balanced coin to be able to assert with a prob of at
most 0.01 that the difference between the proportion of tails and 0.50 will be at least
0.04?
Solution:
Suppose we flip the coin n times and suppose X is the no of tails obtained. Thus the
X No of tails
proportion of tails = = . We must find n so that
n Total No of flips
X
P − 0.50 ≥ 0.04 ≤ 0.01
n
Now X = no of tails among n flips of a balanced coin is a rv having Binomial distribution

with parameters n and 0.5.
Hence µ = E(X ) = np = n × 0.50
σ = n p q = n × 0.50 (as p = q = 0.50)
X
Now − 0.50 ≥ 0.04 is equivalent to X − n × 0.50 ≥ 0.04n.
n
1
We know P( X − µ ≥ kσ )≤
k2
Here kσ = 0.04n
0.04n
∴k = = 0.08 n
0.50 × n
40
X
∴P − 0.50 ≥ 0.04
n
1
= P(| X − µ | ≥ kσ ) ≤ ≤ 0.01
k2
1
= 100 or if (.08) n ≥ 100.
2
if k 2 ≥
0.01
100
or n ≥ =15625
(.08)2
Law of large Numbers
Suppose a factory manufactures items. Suppose there is a constant prob p that an item is
defective. Suppose we choose n items at random and let X be the no of defectives found.
Then X is a rv having binomial distribution with parameters n and p.
∴ mean µ = E (X ) = np, var iance σ 2 = npq
Let ε be any no > 0.
X
Now P −p ≥ε
n
= P( X − np ≥ nε ) = P( x − µ ≥ kσ ) (where kσ = nε )
1 σ2 npq pq
≤ ( by Chebyshev '
s theorem ) = = 2 2 = 2 → 0 as n → ∞.
k 2
n ε
2 2
n ε nε
Thus we can say that the prob that the proportion of defective items differs from the
actual prob. p by any + ve no ε → 0 as n → ∞ . (This is called the Law of Large
numbers)
This means “most of the times” the proportion of defectives will be close to the actual
X
(unknown) prob p that an item is defective for large n. So we can estimate p by , the
n
(Sample) proportion of defectives.
41
POISSON DISTRIBUTION
A random variable X is said to have a Poisson distribution with parameter λ > 0 if its
probability distribution is given by
λx
P ( X = x ) = f ( x; λ ) = e − λ x = 0,1,2......
x!
We can easily show: mean of X = µ = λ and variance of X = σ 2 = λ.
Also P( X = x ) is largest when x = λ − 1 and λ if λ is an integer and when x = [λ ] = the

greatest integer ≤ λ (when λ is not an integer). Also note that P( X = x ) → 0 as x → ∞.
POISSON APPROXIMATION TO BINOMIAL DISTRIBUTION
Suppose X is a rv having Binomial distribution with parameters n and p. We can easily

show b(x; n, p ) = P(X = x ) → f (x; ) as n → ∞ in such a way that np remains a constant
λ.
Hence for n large, p small, the binomial prob b( x; n, p ) can be approximated by the
Poisson prob f ( x; λ ) where λ = np.
Example 17
b(3;100, 0.03)
e −3 3 3
≈ f (3;3) =
3!
If 0.8% of the fuses delivered to an arsenal are defective, use the Poisson approximation
to determine the probability that 4 fuses will be defective in a random sample of 400.
Solution
If X is the number of defectives in a sample of 400, X has the binomial distribution with
parameters n = 400 and p = 0.8% = 0.008.
42
Thus P (4 out of 400 are defective)
= b(4; 400, 0.008) ≈ f (4; λ ) (Where λ = 400 × 0.008 = 3.2 )
=e − 3.2 (3.2)4
4!
= 0.781 − 0.603
(from table 2 at the end of the text)
= 0.178
Cumulative Poisson Distribution Function
If X is a rv having Poisson Distribution with parameter λ , the cumulative Poisson Prob

x x
= F(x; λ ) = P(X ≤ x ) = P(X = k ) = f (k; λ )
k =0 k =0
For various λ and x, F(x; λ ) has been tabulated in table 2 (of your text book on page 581
to 585) .We use the table 2 as follows.
f (x; λ ) = P(X = x ) = P(X ≤ x ) − P(X ≤ x − 1)

= F(x; λ ) − F(x − 1; λ )
Thus f (4;3.2) = F (4;3.2) − F (3;3.2) = 0.781 − 0.603 = 0.178.
Poisson Process
There are many situations in which events occur randomly in regular intervals of time.
For example in a time period t, let X t be the number of accidents at a busy road junction
in New Delhi; X t be the number of calls received at a telephone exchange; X t be the
number of radio active particles emitted by a radioactive source etc. In all such examples
we find X t is a discrete rv which can take non-ve integral values 0,1,2,….. The important
thing to note is that all such random variables have “same” distribution except that the
parameter(s) depend on time t.
The collection of random variables (X t ) t > 0 is said to constitute a random process. If

each (X t ) has a Poisson Distribution, we say (X t ) is a Poisson process. Now we show
the rvs (X t ) which counts the number of occurrences of a random phenomena in a time
43
period t constitute a Poisson process under suitable assumptions. Suppose in a time
period t, a random phenomenon which we call “success” occurs. We let Xt = number of
successes in time period t. We assume :
1. In a small time period ∆t , either no success or one success occurs.

2. The prob of a success in a small time period ∆t is proportional to ∆t i.e. say
P ( X ∆t = 1) = α∆t . ( α → constant of proportionality)
3. The prob of a success during any time period does not depend on what
happened prior to that period.
Divide the time period t into n small time periods each of length ∆t . Hence by
assumptions above, we note that Xt = no of successes in time period t is a rv having
Binomial distribution with parameters n and p = α∆t . Hence
P(X t = x ) = b(x; n , α∆t )
→ f (x; ) as n → ∞
where = n. t
So we can say that Xt = no of successes in time period t is a rv having Poisson

distribution with parameter α t.
Meaning of the proportaratility constant α
Since mean of X t is λ = αt , We find α = mean no of successes in unit time.
(Note: For a more rigorous derivation of the distribution of Xt, you may see Meyer,
Introductory probability and statistical applications, pages 165-169).
Given that the switch board of a consultant’s office receives on the average 0.6 call per
minute, find the probability that
(a) In a given minute there will be at least one call.

(b) In a 4-minute interval, there will be at least 3 calls.
44
Solution
Xt= no of calls in a t-minute interval is a rv having Poisson distribution with parameter

αt = 0.6t
(a) P(X1 ≥ 1) = 1 − P(X 1 = 0 ) = 1 − e −0.6 = 1 − 0.549 = 0.451.

(b) P(X 4 ≥ 3) = 1 − P(X 4 ≤ 2 ) = 1 − F (2;2.4) = 1 − 0.570 = 0.430
Example 20
Suppose that Xt, the number of particles emitted in t hours from a radio – active source
has a Poisson distribution with parameter 20t. What is the probability that exactly 5
particles are emitted during a 15 minute period?
Solution
1
15 minutes = hour
4
1
Hence if X 14 = no of particles emitted in hour
4
5
1
× 20
( )
P X 14 = 5 = e
− 14 × 20 4
5!
= e −5
55
5!
= 0.616 − 0.440 = 0.176 (from table 2)
45
THE GEOMETRIC DISTRIBUTION
Suppose there is a random experiment having only two possible outcomes, called
‘success’ and ‘failure’. Assume that the prob of a success in any one ‘trial’ ( ≡ repetition
of the experiment) is p and remains the same for all trials. Also assume the trials are
independent. The experiment is repeated till a success is got. Let X be the rv that counts
the number of trials needed to get the 1st success. Clearly X = x if the first (x-1) trials
were failures and the xth trial gave the first success. Hence
P(X = x ) = g (x; p ) = (1 − p ) (x = 1,2......)

x −1
p = q x −1 p
We say X has a geometric distribution with parameter p (as the respective probabilities
form a geometric progression with common ratio q).
We can show the mean of this distribution is
1 q
µ= and the variance is σ 2 = 2
p p
(For example suppose a die is rolled till a 6 is got. It is reasonable to expect on an average
1
we will need 1 = 6 rolls as there are 6 nos!)
6
An expert hits a target 95% of the time. What is the probability that the expert will miss
the target for the first time on the fifteenth shot?
Solution
Here ‘Success’ means the expert misses the target. Hence p = P(Success ) = 5% = 0.05 . If
X is the rv that counts the no. of shots needed to get ‘a success’, we want
P ( X = 15) = q 14 × p = (0.95) × 0.05.
14
46
Example 22
The probability of a successful rocket launching is 0.8. Launching attempts are made till
a successful launching has occurred. Find the probability that exactly 6 attempts will be
necessary.
Solution (0.2)5 × 0.8
Example 23
X has a geometric distribution with parameter p. show
(a) P ( X ≥ r ) = q r −1 r = 1,2,.........
(b) P(x ≥ s + t | x > s ) = P( X ≥ t )
Solution
∞
q r −1 p
(a) P(X ≥ r ) = q x −1 .p = = q r −1 .
x =r 1− q
P ( X ≥ s + t ) q s +t −1
(b) P(X ≥ s + t X > s )= = = q t −1 = P ( X ≥ t ).
P( X > s ) q s
Application to Queuing Systems
Service facility
Customers arrive in a
Depart after service
Poisson Fashion S
There is a service facility. Customers arrive in a random fashion and get service if the
server is idle. Else they stand in a Queue and wait to get service.
Examples of Queuing systems

1. Cars arriving at a petrol pump to get petrol
2. Men arriving at a Barber’s shop to get hair cut.
3. Ships arriving at a port to deliver goods.
47
Questions that one can ask are :
1. At any point of time on an average how many customers are in the system
(getting service and waiting to get service)?
2. What is the mean time a customer waits in the system?
3. What proportion of time a server is idle? And so on.
We shall consider only the simplest queueing system where there is only one server. We
assume that the population of customers is infinite and that there is no limit on the
number of customers that can wait in the queue.
We also assume that the customers arrive in a ‘Poission fashion’ at the mean rate of α .
This means that X t the number of customers that arrive in a time period t is a rv having
Poisson distribution with parameter α t . We also assume that so long as the service
station is not empty, customers depart in a Poisson fashion at a mean rate of β . This
means, when there is at least one customer, Yt , the number of customers that depart
(after getting service) in a time period t is a r.v. having Poisson distribution with
parameter βt (where β > α ).
Further assumptions are : In a small time interval ∆t , there will be a single arrival or a
single departure but not both. (Note that by assumptions of Poisson process in a small
time interval ∆t , there can be at most one arrival and at most one departure). Let at time
t, N t be the number of customers in the system. Let P ( N t = n ) = p n (t ). We make another
assumption:
p n (t ) → π n as t → ∞. π n is known as the steady state probability distribution of the

number of customers in the system. It can be shown:
α
π o =1−
β
n
α α
π n = 1− (n = 0, 1, 2, . . .)
β β
Thus L = Mean number of customers in the system getting service and waiting to get
service)
48
∞
α
= n.π n =
n =0 β −α
L q = Mean no of customers in the queue (waiting to get service)
∞
α2 α
= (n − 1) π n = =L−
n =1 β (β − α ) β
W = mean time a customer spends in the system
1 L
= =
β −α α
W q = Mean time a customer spends in the queue.
α Lq 1
= = =W − .
β (β − α ) α β
(For a derivation of these results, see Operations Research Vol. 3 by Dr. S.

Venkateswaran and Dr. B Singh, EDD Notes of BITS, Pilani).
Trucks arrive at a receiving dock in a Poisson fashion at a mean rate of 2 per hour. The
trucks can be unloaded at a mean rate of 3 per hour in a Poisson fashion (so long as the
receiving dock is not empty).
(a) What is the average number of trucks being unloaded and waiting to get
unloaded?
(b) What is the mean no of trucks in the queue?
(c) What is the mean time a truck spends waiting in the queue?
(d) What is the prob that there are no trucks waiting to be unloaded?
(e) What is the prob that an arriving truck need not wait to get unloaded?
49
Solution
Here α = arrival rate = 2 per hour

β = departure rate = 3 per hour.
Thus
α 2
(a) L= = =2
β −α 3−2
α2 22 4
(b) Lq = = =
β (β − α ) 3(1) 3
α 2
(c) Wq = = hr
β (β − α ) 3
(d) P (no trucks are waiting to be unloaded)
= (No of trucks in the dock is 0 or 1)
α α α 2 2 2
= π 0 +π 1 = 1− + 1− = 1− + 1−
β β β 3 3 3
1 2 5
= + =
3 9 9
(e) P (arriving truck need not wait)

= P (dock is empty)
1
= π0 =
3
Example 25
With reference to example 24, suppose that the cost of keeping a truck in the system is
Rs. 15/hour. If it were possible to increase the mean loading rate to 3.5 trucks per hour at
a cost of Rs. 12 per hour, would this be worth while?
50
Solution
In the old scheme, α = 2, β = 3, L = 2
∴ Mean cost per hour to the dock = 2 x 15 = 30/hr.
4
In the new scheme α = 2, β = 3, L = verify!
3
4
∴ Net cost per hour to the dock = × 15 + 12 = 32 / hr.
3
Hence it is not worthwhile to go in for the new scheme.
51
MULTINOMIAL DISTRIBUTION
Consider a random experiment E and suppose it has k possible outcomes A1 , A2 ,.... Ak .

Suppose P ( Ai ) = pi for all i and that pi remains the same for all independent repetitions
of E. Consider n independent repetitions of E. Suppose A1 occurs X1 times, A2 occurs X2
times, …, Ak occurs Xk times. Then P ( X 1 = x1 , X 2 = x 2 ,.... X k = x k )
n!
= p1x1 p 2x 2 ..... p kxk
x1 ! x 2 !......x k !
for all non-ve integers x1 , x 2 .., x k with x1 + x 2 + ... + x k = x
Proof. The probability of getting A1 x1 times, A2 x 2 times, Ak x k times in any one way
is p1x1 p 2x2 ...... p kxk as all the repetitions are independent. Now among the n repetitions
n n!
A1 occurs x1 times in = ways.
x1 x1 ! (n − x1 )!
From the remaining n − x1 repetitions A2 can occur x2 times in

n − x1 (n − x1 )!
= ways and so on.
x2 x 2 ! (n − x1 − x 2 )!
Hence the total number of ways of getting A1 x1 times, A2 x 2 times, …. Ak x k times will
be
n! (n − x1 )! (n − x1 − x 2 .....x k −1 )!
× × ...
x1 ! (n − x1 )! x 2 ! (n − x1 − x 2 )! x k ! (n − x1 − x 2 ....x k −1 − x k )!
n!
= as x1 + x 2 + .....x k = n and 0! = 1
x1 ! x 2 !......x k !
n!
Hence P ( X 1 = x1 , X 2 = x 2 ,..... X k = x k ) = p1x1 p 2x2 .... p kxk
x1 ! x 2 !....x k !
52
Example 26
A die is rolled 30 times. Find the probability of getting 1 2 times, 2 3 times, 3 4 times,
4 6 times, 5 7 times and 6 8 times.
Ans
2 3 4 6 7 8
30! 1 1 1 1 1 1
×
2! 3! 4! 6! 7! 8! 6 6 6 6 6 6
Example 27 (See exercise 4.72 of your text)
The probabilities are, respectively, 0.40, 0.40, and 0.20 that in city driving a certain type
of imported car will average less than 10 kms per litre, anywhere between 10 and 15 kms
per litre, or more than 15 kms per litre. Find the probability that among 12 such cars
tested, 4 will average less than 10 kms per litre, 6 will average anywhere from 10 to 15
kms per litre and 2 will average more than 15 kms per litre.
Solution
12!
(.40)4 (.40)6 (.20)2 .
4! 6! 2!
Remark
1. Note that the different probabilities are the various terms in the expansion of the
multinomial
( p1 + p 2 + ...... p k )n .
Hence the name multinomial distribution.
2. The binomial distribution is a special case got by taking k =2.

3. For any fixed i (1 ≤ i ≤ k )X i (the number of ways of getting Ai ) is a random
variable having binomial distribution with parameters n and pi. Thus
E ( X i ) = n p i and V(X i ) = np i (1 − p i ). i = 1,2..........k
53
SIMULATION
Nowadays simulation techniques are being applied to many problems in Science and
Engineering. If the processes being simulated involve an element of chance, these
techniques are referred to as Monte Carlo methods. For example to study the distribution
of number of calls arriving at a telephone exchange, we can use simulation techniques.
Random Numbers : In simulation problems one uses the tables of random numbers to
“generate” random deviates (values assumed by a random variable). Table of random
numbers consists of many pages on which the digits 0,1,2….. 9 are distributed in such a
1
was that the probability of any one digit appearing is the same, namely 0.1 = .
10
Use of random numbers to generate ‘heads’ and ‘tails’. For example choose the 4th
column of the four page of table 7, start at the top and go down the page. Thus we get
6,2,7,5,5,0,1,8,6,3….. Now we can interpret this as H,H,T, T,T, H, T, H, H,T, because the
prob of getting an odd no. = the propagating an even number = 0.5 Thus we associate
head to the occurrence of an even number and tail to that of an odd no. We can also
associate a head if we get 5,6,7,8, or 9 and tail otherwise. The use can say we got
H,T,H,H,H,T,T,H,H,T….. In problems on simulation we shall adopt the second scheme
as it is easy to use and is easily ‘extendable’ for more than two outcomes. Suppose for
example, we have an experiment having 4 outcomes with prob. 0.1, 0.2, 0.3 and 0.4
respectively.
Thus to simulate the above experiment, we have to allot one of the 10 digits 0,1….9 to
the first outcome, two of them to the second outcome, three of them to the third outcome
and the remaining four to the fourth outcome. Though this can be done in a variety of
ways, we choose the simplest way as follows:
Associate the first digit 0 to the 1st outcome 01

Associate the next 2 digits 1,2 to the 2nd outcome 0 2
Associate the next 3 digits 3,4,5 to the 3rd outcome 0 3 .
And associate the last 4 digits 6,7,8,9 to the 4th outcome 0 4 .
Hence the above sequence 6,2,7,5,5,0,1,8,6,3… of random numbers would correspond to

the sequence of outcomes O 4 , O 2 , O 4 , O3 , O3 , O1 , O 2 , O 4 , O 4 , O3 ..............
Using two and higher – digit Random numbers in Simulation

54
Suppose we have a random experiment with three outcomes with probabilities 0.80, 0.15
and 0.05 respective. How can we now use the table of random numbers to simulate this
experiment? We now read 2 numbers at a time : say (starting from page 593 room 12,
1
column 4) 84,71,14,24,20,31,78, 03………….. Since P (anyone digit) = , P (any two
10
1 1
digits) = × = 0.01 . Thus each 2 digit random number occurs with prob 0.01.
10 10
Now that there will be 100 2 digit random numbers : 00, 01, …, 10, 11, …, 20, 21, …,
98, 99. Thus we associate the first 80 numbers 00,01…79 to the first out come, the next
15 numbers (80, 81, …94) to the second outcome and the last 5 numbers (95, 96, …, 99)
to the 3rd outcome. Thus the above sequence of 2 digit random numbers would simulate
the outcomes:
O 2 , O1 , O1 , O1 , O1 , O1 , O1 , O1 .......
We describe the above scheme in a diagram as follows:
Outcome Probability Cumulative Probability* Random Numbers**

O1 0.80 0.80 00-79
O2 0.15 0.95 80-94
O3 0.05 1.00 95-99
* Cumulative prob is got by adding all the probabilities at that position and above thus cumulative
prob at O2 = Prob of O1 + Prob O2 = 0.80 + 0.15 = 0.95.
** You observe the beginning random number is 00 for the 1st outcome; and for the remaining
outcomes, it is one more than the ending random numbers of the immediately preceding outcome.
Also the ending random number for each outcome is “one less than the cumulative probability”.
Similarly three digit random numbers are used if the prob of an outcome has 3 decimal
places. Read the example on page 133 of your text book.
55
Exercise 4.97 on page 136
Cumulative
No. of polluting spices Probability Random Numbers
Probability
0 0.2466 0.2466 0000-2465
1 0.3452 0.5918 2466-5917
2 0.2417 0.8335 5918-8334
3 0.1128 0.9463 8335-9462
4 0.0395 0.9858 9463-9857
5 0.0111 0.9969 9858-9968
6 0.0026 0.9995 9969-9994
7 0.0005 1.0000 9995-9999
Starting with page 592, Row 14, Column 7, we read of the 4 digit random nos as :
R No. Polluting spics R.No. Polluting spics

5095 1 2631 1
0150 0 3033 1
8043 2 9167 3
9079 3 4998 1
6440 2 7036 2
CONTINOUS RANDOM VARIABLES
In many situations, we come across random variables that take all values lying in a
certain interval of the x axis.
Example
(1) life length X of a bulb is a continuous random variable that can take all non-ve
real values.
(2) The time between two consecutive arrivals in a queuing system is a random
variable that can take all non-ve real values.
56
(3) The distance R of the point (where a dart hits) (from the centre) is a
continuous random variable that can take all values in the interval (0,a) where
a is the radius of the board.
It is clear that in all such cases, the probability that the random variable takes any one
particular value is meaningless. For example, when you buy a bulb, you ask the question?
What are the chances that it will work for at least 500 hours?
Probability Density function (pdf)
If X is a continuous random variable, the questions about the probability that X takes
values in an interval (a,b) are answered by defining a probability density function.
Def Let X be a continuous rv. A real function f(x) is called the prob density function of X
if
(1) f ( x ) ≥ 0 for all x
f ( x )dx = 1
∞
(2)
−∞
P (a ≤ X ≤ b ) = f ( x ) dx.
b
(3)
a
Condition (1) is needed as probability is always ≥ 0.

Condition (2) says that the probability of the certain event is 1.
Condition (3) says to get the prob that X takes a value between a and b, integrate the
function f(x) between a and b. (This is similar to finding the mass of a rod by integrating
its density function).
Remarks
P( X = a ) = P(a ≤ X ≤ a ) = f ( x )dx = 0
a
1.
a
2. Hence P(a ≤ X ≤ b ) = P(a < X ≤ b ) = P(a ≤ X < b ) = P(a < X < b )

Please note that unlike discrete case, it is immaterial whether we include or
exclude one or both the end points.
3. P( x ≤ X ≤ x + ∆x ) ≈ f ( x )∆x
57
This is proved using Mean value theorem.
Definition (Cumulative Distribution function)
If X is a continuous rv and if f(x) is its density,
P( X ≤ x ) = P(− ∞ < X ≤ x ) = f (t )dt

x
−∞
We denote the above by F(x) and call it the cumulative distribution function (cdf) of X.
Properties of cdf
1. 0 ≤ F ( x ) ≤ 1 for all x.
2. x1 < x 2 F ( x1 ) ≤ F ( x 2 ) i.e., F(x) is a non-decreasing function of x.
3. F (− ∞ ) = lim f ( x ) = 0; f (+ ∞ ) = lim F (x ) = 1.
x → −∞ x →∞
x
d d
4. F (x ) = f (t ) dt = f ( x )
dx dx −∞
(Thus we can get density function f(x) by differentiating the distribution function F(x)).
If the prob density of a rv is given by f ( x ) = kx 2 0 < x < 1 (and 0 elsewhere) find the
value of k and the probability that the rv takes on a value
1 3
(a) Between and
4 4
2
(b) Greater than
3
Find the distribution function F(x) and hence answer the above questions.
58
Solution
f ( x )dx = 1
∞
−∞
gives
f ( x )dx = 1 (as f ( x ) = 0 if x < 0 or > 1)

1
1 1
i.e. kx 2 dx = 1 or k = 1 or k = 3.
0 3
Thus f ( x ) = 3 x 2 0 ≤ x ≤ 1 and 0 otherwise.
3 3
1 3 3
3 1 26 13
P <X < = 3 x dx =
2
− = =
4
4 4 4 4 64 32
1
4
2 2 1
P X > = P < X <1 = 3 x 2 dx
3 3 2
3
3
2 19
= 13 − =
3 27
Distribution function F (x ) = f (t )dt

x
−∞
Case (i) x ≤ 0 . In this case f (t ) = 0 between − ∞ and x∴ F ( x ) = 0
Case (ii) 0<x<1. In this case f (t ) = 3t 2 between 0 and x and 0 for t<0.
∴ F (x ) = f (t )dt =
x x
3t 2 dt = x 3 .
−∞ 0
Case (iii) x>1
Now f (t ) = 0 for t > 1
59
∴ F (x ) = f (t )dt = f (t )dt = 1 (by case ii )
x 1
−∞ −∞
Hence we can say the distribution function
0 x≤ 0
F (x ) = x 3 0< x ≤1
1 x> 0
1 3 3 1
Now P <X < =P X < −P X ≤
4 4 4 4
3 1
= P X ≤ −P X ≤
4 4
3 3
3 1 3 1 13
= F −F = − =
4 4 4 4 32
2 2
P X > = 1− P X ≤
3 3
3
2 2 19
= 1− F =1 − =
3 3 27
The prob density of a rv X is given by
x 0 < x <1
f (x ) = 2 − x 1 ≤ x < 2
0 elsewhere
Find the prob that the rv takes a value
(a) between 0.2 and 0.8

(b) between 0.6 and 1.2
Find the distribution function and answer the same questions.
60
Solution
P(0.2 < X < 0.8) = f ( x )dx

0.8
(a)
0.2
2 2
0.8 0 .8 0 .2
= x dx = − = 0 .3
0.2 2 2
P(0.6 < X < 1.2) = f ( x )dx

1.2
(b)
0.6
f ( x )dx + f (x )dx (why ?)

1 1.2
=
0.6 4
2 2 1.2
2
2− x
(2 − x ) dx = 1 − 0.6
1 1.2
= x dx + + −
0.6 1 2 2 2
1
1 (.8)
2
= 0.32 + = = 0.32 + 0.18 = 0.5
2 2
To Find the distribution function F ( x ) = P( x ≤ x ) = f (t )dt

x
−∞
Case (i) x ≤ 0 In this case f (t ) = 0 for t ≤ x
∴ F (x ) = f (t )dt = 0.
x
−∞
Case (ii) 0 < x ≤ 1 In this case f (t ) = 0 for t ≤ 0 = t and = t for t ≤ x
Hence F ( x ) = f (t )dt = f (t )dt + 1 f (t )dt

x 0 x
−∞ −∞ 0
x x2
=0+ t dt =
0 2
Case (iii) 1 < x ≤ 2 In this case f (t ) = 0 t≤0

t 0< t ≤1
2−t 1< t ≤ x
61
∴ F (x ) = f (t )dt
x
−∞
f (t )dt + f (t )dt
1 x
=
−∞ 1`
1
(by case ii ) + (2 − t )dt
x
=
2 1
1 1 (2 − x ) (2 − x )2 2
= + − = 1−
2 2 2 2
Case (iv) x > 2 In this case f (t ) = 0 for 2 < t < x
∴ F (x ) = f (t )dt
x
−∞
f (t )dt + f (t )dt
2 x
=
−∞ 2
= 1 (by case iii ) +

x
0 dt = 1
2
Thus
0 x ≤ 0
x2
0< x≤ 1
F (x ) = 2
1−
(2 − x)
2
1< x≤ 2
2
1 x > 2
∴ P (0.6 < X < 1.2 ) = P( X < 1.2 ) − P ( X ≤ 0.6 )

= P ( X ≤ 1 .2 ) − P ( X ≤ 0 .6 )
= F (1.2 ) − F (0.6 )
= 1−
(0 .8 )
2
−
(0.6 )2
2 2
= 0 .5
62
P ( X > 1 .8 ) = 1 − P ( X ≤ 1 .8 )
= 1 − F (1.8) = 1 − 1−
(.2 )
2
= 0.02
2
The mean and Variance of a continuous r.v
Let X be a continuous rv with density f(x)
We define its mean as
µ = E(X ) = x f ( x )dx
∞
−∞
We define its variance σ 2 as
∞
E (x − µ ) = (x − µ )2 f (x )dx
2
−∞
( )
= E X 2 −µ2
( )
Here E X 2 =
∞
−∞
x 2 f ( x )dx
Example 3 The density of a rv X is
F ( x ) = 3x 2 0 < x < 1 (and 0 elsewhere )
∞ 3
Its mean µ = E ( X ) = x f ( x )dx =
1
x.3 x 2 dx = .
−∞ 0 4
( )
E X2 =
∞
−∞
x 2 f ( x )dx
1 3
= x 2 . 3 x 2 dx =
0 5
2
3 3
Hence σ 2 = − = 0.0375
5 4
Hence its sd is σ = 0.1936.
63
Example 4 The density of a rv X is
1 − x / 20
e x>0
f ( x ) = 20
0 elsewhere
∞ ∞ 1 − x / 20
µ = E(X ) = x f ( x )dx = x. e dx
−∞ 0 20
Integrating by parts we get
[( )
= x. − e − x / 20 − 20e − x / 20 ] ∞
0
= 20.
( )
E X2 =
∞
−∞
x 2 f ( x )dx
∞ 1 − x / 20
= x2 e dx
0 20
On integrating by parts we get
[x (− e
2 − x / 20
) − (2 x ) (20 e − x / 20
) + 2.(− 400 e − x / 20
)]
∞
0
= 800
( )
∴σ 2 = E X 2 − µ 2 = 800 − 400 = 400
∴σ = 20.
NORMAL DISTRIBUTION
A random variable X is said to have the normal distribution (or Gaussian Distribution) if
its density is
( x − µ )2
( ) 1 −
f x; µ , σ 2 = e 2σ 2
−∞ < x < ∞
2π σ
Hence µ , σ are fixed (called parameters) and σ > 0. The graph of the normal density is
a bell shaped curve:
64
Figure
It is symmetrical about the line x = µ and has points of inflection at x = µ ± σ .
f ( x )dx = 1 . We also see that E ( X ) = µ and

∞
One can use integration and show that
−∞
variance of X = E ( X − µ ) = σ 2 .
2
If µ = 0, σ = 1, we say that X has standard normal distribution. We usually use the

symbol Z to denote the variable having standard normal distribution. Thus when Z is
1
standard normal, its density is f ( z ) = e − z 2 , − ∞ < z < ∞.
2
2π
The cumulative distribution function of Z is
1
F ( z ) = P (Z ≤ z ) =
z
e −t
2
2
dt
−∞
2π
and represents the area under the density upto z. It is the shaded portion in the figure.
Figure
1
We at once see from the symmetry of the graph that F (0 ) = = 0 .5
2
F (− z ) = 1 − F ( z )
65
F(z) for various positive z has been tabulated at in table 3 (at the end of your book).
We thus see from Table 3 that
F (0.37 ) = 0.6443, F (1.645) = 0.95
F (2.33) = 0.99 F ( z ) ≈ 1 for z ≥ 3
Hence F (− 0.37 ) = 1 − 0.6443 = 0.3557
F (− 1.645) = 1 − 0.95 = 0.05 etc
Definition of zα
If Z is standard normal, we define zα to be that number such that

P (Z > z α ) = α or F ( z α ) = 1 − α .
Since F(1.645) = 0.95 = 1-0.05, we see that
z 0.05 = 1.645
Similarly z 0.01 = 2.33
we also note z1−α = − zα
Thus z 0.95 = − z 0.05 = −1.645
z 0.99 = − z 0.01 = −2.33.
Important
If X is normal with mean µ and variance σ 2 , it can be shown that the standardized r.v.
X −µ
Z= has standard normal distribution. Thus questions about the prob that X
σ
assumes a value between say a and b can be translated into the prob that Z assumes
values in a corresponding range. Specifically :
P(a < X < b )
66
a−µ X −µ b−µ a−µ b−µ
=P < < =P <Z<
σ σ σ σ σ
b−µ a−µ
=F −F
σ σ
Example 1 (See Exercise 5.24 on page 152)
Given that X has a normal distribution with mean µ = 16.2 and variance σ 2 = 1.5625,
find the prob that it will take on a value
(a) > 16.8

(b) < 14.9
(c) between 13.6 and 18.8
(d) between 16.5 and 16.7
Here σ = 1.5625 = 1.25
X −µ 16.8 − 16.2
Thus P ( X > 16.8) = P >
σ 1.25
.6
=P Z> = P (Z > 0.48)
1.25
= 1 − P (z ≤ 0.48) = 1 − F (0.48)
= 1 − 0.6844 = 0.3156
X −µ 14.9 − 16.2
(b) P ( X < 14.9 ) = P <
σ 1.25
1 .3
=P Z <− = P (Z < −1.04 )
1.25
= F (− 1.04 ) = 1 − F (1.04 ) = 1 − 0.8508 = .1492
P (13.6 < X < 18.8)

13.6 − 16.2 X − µ 18.8 − 16.2
P < <
1.25 σ 1.25
67
2 .6 2 .6
=P − <Z < = P (− 2.08 < Z < 2.08)
1.25 1.25
= F (2.08) − F (− 2.08) = F (2.08) − (1 − F (2.08))
= 2 F (2.08) − 1 = 2 × 0.9812 − 1 = .9624
(Note that P(− c < Z < c ) = 2 F (c ) − 1 for c > 0 )
16.5 − 16.2 X − µ 16.7 − 16.2

P (16.5 < X < 16.7 ) = P < <
1.25 σ 1.25
.3 .5
=P <Z <
1.25 1.25
= P (0.24 < z < 0.4 ) = F (0.4 ) − F (0.24 )
= 0.6554 − 0.5948 = 0.606
Example 2
A rv X has a normal distribution with σ = 10. If the prob is 0.8212 that it will take on a
value < 82.5, what is the prob that it will take on a value > 58.3?
Solution
Let the mean (unknown) be µ .
Given P( X < 82.5) = 0.8212
X −µ 82.5 − µ
Thus P < = 0.8212
σ 10
82.5 − µ
Or P Z < = 0.8212
10
82.5 − µ
F = 0.8212
10
82.5 − µ
From table 3, = 0.92
10
Or µ = 82.5 − 9.2 = 73.3

Hence P( X > 58.3)
68
X −µ 58.3 − 73.3
=P > = P (Z > 1 / 5 )
σ 10
= 1 − P(Z ≤ −1.5) = 1 − F (− 1.5)

= 1 − (1 − F (1.5)) = F (1.5) = 0.9332
In a Photographic process the developing time of prints may be looked upon as a r.v. X
having normal distribution with µ = 16.28 seconds and s.d. of 0.12 second. For which
value is the prob 0.95 that it will be exceeded by the time it takes to develop one of the
prints.
Solution
That is find a number c so that
P( X > c ) = 0.95
X −µ c − 16.28
i.e P > = 0.95
σ 1 .2
c − 16.28
i.e. P Z > = 0.95
1 .2
c − 16.28
Hence P Z ≤ = 0.05
1 .2
c − 16.28
∴ = 1.645
1 .2
∴ c = 16.28 − 1.2 × 1.645 = 14.306.
NORMAL APPROXIMATION TO BINOMIAL DISTRIBUTION

Suppose X is a r.v. having Binomial distribution with parameters n and p. Then it can be
X − np
shown that P ≤ z → P (Z ≤ z ) = F ( z ) as n → ∞. i.e in words, standardized
npq
binomial tends to standard normal.
69
Thus when n is large, the binomial probabilities can be approximated using normal
distribution function.
A manufacturer knows that on the average 2% of the electric toasters that he makes will
require repairs within 90 days after they are sold. Use normal approximation to the
binomial distribution to determine the prob that among 1200 of these toasters at least 30
will require repairs within the first 90 days after they are sold?
Solution
Let X = No. of toasters (among 1200) that require repairs within the first 90 days after
they are sold. Hence X is a rv having Binomial Distribution with parameters n = 1200
2
and p = = .02.
100
X − np 30 − 24
Required P ( X ≥ 30 ) = P ≥
npq 4.85
≈ P (Z ≥ 1.24 )1 − P(Z < 1.24 )

= 1 − F (1.24 ) = 1 − 0.8925 = 0.1075
Correction for Continuity
Since for continuous rvs P( z ≥ c ) = P( z > c ) (which is not true for discrete rvs), when we
approximate binomial prob by normal prob, we must ensure that we do not ‘lose’ the end
point. This is achieved by what we call continuity correction: In the previous example,
P( X ≥ 30) also = P( X ≥ 29.5) (Read the justification given in your book on page 150
line 1to 7).
X − np 29.5 − 24
=P ≥
npq 4.85
5 .5
≈P Z≥ = P(Z ≥ 1.13)
4.85
= 1 − P (Z ≤ 1.13) = 1 − F (1.13) = 1 − 0.878
= .1292
(probably better answer).

70
A safety engineer feels that 30% of all industrial accidents in her plant are caused by
failure of employees to follow instructions. Find approximately the prob that among 84
industrial accidents anywhere from 20 to 30 (inclusive) will be due to failure of
employees to follow instructions.
Solution
Let X = no. of accidents (among 84) due to failure of employees to follow instructions.
Thus X is a rv having Binomial distribution with parameters n = 84 and p = 0.3.
Thus np = 25.2 and npq = 4.2
Required P(20 ≤ X ≤ 30)
= P(19.5 ≤ X ≤ 30.5) (continuity correction)
19.5 − 25.2 X − np 30.5 − 25.2

=P ≤ ≤
4 .2 npq 4 .2
≈ P (− 1.36 ≤ Z ≤ 1.26 )
= F (1.26 ) − F (− 1.36 ) = F (1.26 ) + F (1.36 ) − 1

= 0.8962 + 0.9131 − 1 = 0.8093
OTHER PROBABILITY DENSITIES

The Uniform Distribution
A r.v X is said to have uniform distribution over the interval (α , β ) if its density is given
by
1
α <x<β
f (x ) = β − α
0 elsewhere
71
Thus the graph of the density is a constant over the interval (α , β )
If α <c<d <β
1 d −c
P (c < X < d ) =
d
dx =
c β −α β −α
and thus is proportional to the length of the interval (c, d ).
You may verify that
α +β
The mean of X = E ( X ) = µ = (mid point of the interval (α , β ) )
2
The variance of X = σ 2
=
( β − α )2
. The cumulative distribution function is
12
0 x ≤α
x −α
f (x ) = α <x≤β
β −α
1 x>β
Example 6 (See page 165 exercise 546)
In certain experiments, the error X made in determining the solubility of a substance is a

rv having the uniform density with α = −0.025 and β = 0.025 . What is the prob such an
error will be
(a) between 0.010 and 0.015?

(b) between –0.012 and 0.012?
Solution
0.015 − 0.010
(a) P (0.010 < X < 0.015) =
0.025 − (− 0.025)
0.005
= = 0 .1
0.050
0.012 − (− 0.012 )
(b) P (− 0.012 < X < 0.012) =
0.025 − (− 0.025)
12
= = 0.48
25
72
Example 7 (See exercise 5.47 on page 165)
From experience, Mr. Harris has found that the low bid on a construction job can be
regarded as a rv X having uniform density
3 2C
< x < 2C
f ( x ) = 4C 3
0 elsewhere
where C is his own estimate of the cost of the job. What percentage should Mr. Harris
add to his cost estimate when submitting bids to maximize his expected profit?
Solution
Suppose Mr. Harris adds k% of C when submitting his bid. Thus Mr. Harris gets a profit
kC kC
if he gets the contract which happens if the lowest bid (by others) ≥ C + and
100 100
kC
gets no profit if the lowest bid < C + . Thus the prob that he gets the bid
100
kC 3 kC 3 k
=P C+ < X < 2C = × 2C − C + = 1−
100 4C 100 4 100
Thus the expected profit of Mr. Harris is
kC 3 k
× 1− + 0 × (....)
100 4 100
3C k2
= k−
400 100
which is maximum (by using calculus) when k =50.
Thus Mr. Harris’s expected profit is a maximum when he adds 50% of C to C, when
submitting bids.
Gamma Function
This is one of the most useful functions in Mathematics. If x > 0, it is shown that the
∞
improper integral e − t t x −1 dt converges to a fuite real number which we denote by Γ( x )
0
(Capital gamma of x). Thus for all real no x > 0, we define
Γ( x ) =
∞
e −t t x −1 dt.
0
73
Properties of Gamma Function
1. Γ( x + 1) = xΓ( x ) , x > 0
2. Γ(1) = 1
3. Γ(2) = 1Γ(1) = 1, Γ(3) = 2Γ(2) = 2 × 1 = 2!
More generally Γ(n + 1) = n! whenever n is a +ve integer or zero.
1
4. Γ = π.
2
5. Γ( x ) decreases in the interval (0,1) and increases in the interval (2, ∞ ) and has a
minimum somewhere between 1 and 2.
THE GAMMA DISTRIBUTION
Let α 1 β be 2 +ve real numbers. A r.v X is said to have a Gamma Distribution with
parameters α 1 β if its density is
1 −x .
e β x α −1 x > 0
f ( x ) = β Γ(α )
α
0 elsewhere
It can be shown that
Mean of X = E ( X ) = µ = αβ
(See the working on Page 159 of your text book)
Variance of X = σ 2 = αβ 2 .
Exponential Distribution
If α = 1, we say X has exponential distribution. Thus X has an exponential distribution

(with parameter β > 0 ) if its density is
1 − xβ
e x>0
f (x ) = β
0 elsewhere
74
We also see easily that:
1. Mean of X = E ( X ) = β
2. Variance of X = σ 2 = β 2
3. The cumulative distribution function of X is
− xβ
1− e x>0
F (x ) =
0 elsewhere
4. X has the memoryless property:
P( X > s + t | X > s ) = P( X > t )., s, t > 0
Proof of (4): P( X > s ) = 1 − P( X ≤ s )
= 1 − F (s ) = e
− sβ
(by (3))
P (( X > s + t ) ∩ ( X > s ))
P( X > s + t | X > s ) =
P( X > s )
P( X > s + t ) e − ( s + t ) / β
= P( x > t ).QED
− tβ
= = = e
P( X > s ) − s
e β
In a certain city, the daily consumption of electric power (in millions of kw hours) can be
treated as a r.v. X having a Gamma distribution with α = 3, β = 2. If the power plant in
the city has a daily capacity of 12 million kw hrs, what is the prob. that the power supply
will be inadequate on any given day?
Solution
The power supply will be inadequate if demand exceeds the daily capacity.
Hence the prob that the power supply is inadequate
∞
= P ( X > 12 ) = f ( x )dx
12
75
x
1 −
Now as α = 3, β = 2, f ( x ) = 3 e 2 x 3−1
2 Γ(3)
x
1 2 −2
= x e
16
∞ x
1 2 −2
Hence P ( X > 12 ) = x e dx
12 10
Integrating by parts, we get
∞
x x x
1 2 − − −
= x − 2e 2 − 2 x 4e 2 + 2 − 8e 2
10
12
=
1
16
[
2 × 12 2 × e − 6 + 8 × 12 × e − 6 + 16e − 6 ]
400 − 6
= e = 25e − 6 = 0.062
10
Example 9 (see exercise 5.58 on Page 166)
The amount of time that a surveillance camera will run without having to be reset is a r.v.
X having exponential distribution with β = 50 days. Find the prob that such a camera
(a) will have to be reset in less than 20 days.

(b) will not have to be reset in at least 60 days.
Solution
The density of X is
x
1 − 50
f (x ) = e x > 0 (and 0 elsewhere)
50
(a) P (The camera has to be reset in < 20 days)
= P (the running time < 20)
76
x x 20
20
1 − 50 −
= P ( X < 20 ) = e dx = − e 50
0
50 0
20 2
− −
= 1− e 50
=1 − e 5
= 0.3297
(b) P (The camera will not have to be reset in at least 60 days.)
∞ x
1 − 50
= P ( X > 60 ) = e dx
60 50
x ∞ 6
− −
= −e 50
=e 5
= 0.3012
60
Given a Poisson process with the average α arrivals per unit time, find the prob density
of the inter arrival time (i.e the time between two consecutive arrivals).
Solution
Let T be the time between two consecutive arrivals. Thus clearly T is a continuous r.v.
with values > 0. Now T > t No arrival in time period t.
Thus P (T > t ) = P ( X t = 0)
( X t = Number of arrivals in time period t)
= e −αt (as X t has a Poisson distribution with parameter λ = αt )
Hence the distribution function of T
= F (t ) = P (T ≤ t ) = 1 − P (t > t ) = 1 − e αt t > 0
(F (t ) = 0 clearly for all t ≤ 0)
77
d
Hence the density of T , f (t ) = F (t )
dt
αe −αt if t > 0
=
0 elsewhere
Hence we would say the IAT is a continuous rv. with exponential density with parameter
1
.
α
The Beta Function
If x,y>0 the beta function, B( x, y ) (read capital Beta x,y), is defined by

1
B ( x, y ) = t x −1 (1 − t )
y −1
dt
0
Γ( x )Γ( y )
It is well-known that B ( x, y ) = , x , y > 0.
Γ( x + y )
BETA DISTRIBUTION
A r.v. X is said to have a Beta distribution with parameter α , β > 0 if its density is
1
f (x ) = , x α −1 (1 − x )
β −1
0 < x <1
B (α , β )
0 elsewhere
It is easily shown that
α
(1) E(X ) = µ =
α+β
αβ
(2) V (X ) = σ 2 =
(α + β ) (α + β + 1)
2
78
Example 11 (See Exercise 5.64)
If the annual proportion of erroneous income tax returns can be looked upon as a rv
having a Beta distribution with α = 2, β = 9, what is the prob that in any given year,
there will be fewer than 10% of erroneous returns?
Solution
Let X = annual proportion of erroneous income tax returns. Thus X has a Gamma density
with α = 2, β = 9.
0.1
∴ P( X < 0.1) = f (x )dx (Note the proportion can not be < 0)
0
0.1
1
x 2 −1 (1 − x ) dx
9 −1
=
0
B (2,9 )
Γ(2 )Γ(9 ) 1× 8! 1 1
B (2,9 ) = = = =
Γ(11) 11! 9 × 10 × 11 990
[(1 − x ) ]
0.1 0.1
x. (1 − x ) dx = − (1 − x ) dx
8 8 9
0 0
(1 − x )9 (1 − x) (.9 )1 (.9 )
10 0.1 9 10
1
= − = + + −
−9 − 10 0
−9 9 10 10
.9 1 1 1 1 19
= (.9 ) = − (.9 ) ×
9 9
− + −
10 9 9 10 90 900
= 0.00293
The Log –Normal Distribution
A r.v X is said to have a log normal distribution if its density is
1
x −1 e − (ln x −α )
2
/ 2β 2
x > 0, β > 0
f (x ) = 2π β
0 elsewhere
79
It can be shown that if X has log-normal distribution, Y = ln X has a normal distribution
with mean µ = α and s.d. σ = β .
Thus P(a < X < b )
= p(ln a < ln X < ln b )
ln a − α ln b − α ln b − α ln a − α
=p <Z < =F −F
β β β β
Where F (z ) = cdf of the standard normal variable Z.
Lengthy calculations show that if X has log-normal distribution, its mean E ( X ) = e α +

β2
2
2
and its variance = e 2α + β e β − 1( 2
)
More problems on Normal Distribution
Example 12
Let X be normal with mean and sd . Determine c as a function of µ and σ such

that
P( X ≤ c ) = 2 P( X ≥ c )
Solution
P ( X ≤ c ) = 2 P (x ≥ c )
Implies P( X ≤ c ) = 2 (1 − P( X < c ))
Let P( X ≤ c ) = p
2
Thus 3 p = 2 or p =
3
X −µ c−µ c−µ 2
Now P ( X ≤ c ) = P ≤ =F = = .6667
σ σ σ 3
c−µ
Implies = 0.43 (approx from Table 3)
σ
∴c = µ + 0.43σ
80
Example 13
(
Suppose X is normal with mean 0 and sd 5. Find P 1 < X 2 < 4 )
Solution
(
P 1< X 2 < 4 )
= P (1 < X < 2 )
1 2 2 1
=P < Z < =P Z < −P Z <
5 5 5 5
2 1 2 1
=2 F −1− 2 F −1 = 2 F −F
5 5 5 5
= 2(0.6554 − 0.5793) from Table 3
= 2 × (.0761) = 0.1522
Example 14
The annual rain fall in a certain locality is a r.v. X having normal distribution with mean
29.5” and sd 2.5”. How many inches of rain (annually) is exceeded about 5% of the time?
Solution
That is we have to find a number C such that
P ( X > C ) = 0.05
X −µ C − 29.5
i.e P > = 0.05
σ 2 .5
C − 29.5
Hence = z 0.05 = 1.645
2 .5
∴C = 29.5 + 2.5 × 1.645

= 33.6125
81
Example 15
A rocket fuel is to contain a certain percent (say X) of a particular compound. The
specification calls for X to lie between 30 and 35. The manufacturer will make a net
profit on the fuel per gallon which is the following function of X.
$ 0.10 per gallon if 30 < X < 35

T (X ) =
$0.05 per gallon if 35 ≤ X < 40 or 25 < X ≤ 30
-$0.10 per gallon elsewhere.
If X has a normal distribution with mean 33and s.d. 3, find the prob distribution of T and
hence the expected profit per gallon.
Solution
T = 0.10 if 30 < X < 35
∴ P (T = 0.10 ) = P(30 < X < 35)

30 − 33 X − µ 35 − 33
=P < <
3 σ 3
2
= P −1< Z <
3
2 2
=F − F (− 1) = F + F (1) − 1
3 3
= 0.7486 + 0.8413 − 1 = 0.5899
P (T = 0.05) = P(35 ≤ X < 40 ) + P (25 < X ≤ 30 )

35 − 33 X − µ 40 − 33 25 − 33 X − µ 30 − 33
=P ≤ < +P < <
3 σ 3 3 σ 5
2 7 −8
=P ≤Z < +P < Z ≤ −1
3 3 3
7 2 8
=F −F + F (− 1) − F −
3 3 3
7 2 8
=F −F +F − F (1)
3 3 3
= 0.9901 − 0.7486 + 0.9961 − 0.8413 = 0.3963

Hence P (T = −0.10 ) = 1 − 0.5899 − 0.3963
= 0.0138
Hence expected profit = E(T)
82
= 0.10 × .5899 + 0.05 × 0.3963 + (− 0.10) × 0.0138
= $0.077425
JOINT DISTRIBUTIONS – Two and higher dimensional Random

Variables
Suppose X,Y are 2 discrete rvs and suppose X can take values x1 , x 2 .......and Y can take
values y1 , y 2 ......... we refer to the function f ( x, y ) = P(Y = x, Y = y ) as the joint prob
distribution of X and Y. The ordered pair (X,Y) is sometimes referred to as a two –
dimensional discrete r.v.
Example 16
Two cards are drawn at random from a pack of 52 cards. Let X be the number of aces
drawn and Y be the number of Queens drawn.
Find the joint prob distribution of X and Y.
Solution
Clearly X can take any one of the three values 0,1,2 and Y one of the three values, 0,1,2.
The joint prob distribution of X, and Y is depicted in the following 3 x 3 table
x
0 1 2
44 4 44 4
2 1 2 2
0
52 52 52
2 2 2
4 44 4 4
1 1 1 1
y 1 0
52 52
2 2
4
2
2 0 0
52
2
83
Justification
P ( x = 0, y = 0 )
= P (no aces and no queens in t he 2 cards)
44
2
=
52
2
P( X = 1, Y = 0) (the entry in the 2nd col and 1st row)
=P (one ace and one other card which is neither ace nor a queen)
44 44
1 1
= etc.
52
2
Can we write down the distribution of X? X can take any one of the 3 values 0,1,2
What is P( X = 0) ?
X = 0 means no ace is drawn but we might draw 2 queens, or 1 queen and one non queen
or 2 cards which are neither aces nor queens.
Thus
P ( X = 0 ) = P ( X = 0, Y = 0 ) + P ( X = 0, Y = 1) + P ( X = 0, Y = 1)
= Sum of the 3 prob in col. 1
44 4 44 4 48
2 1 1 2 2
+ + = (Verify!)
52 52 52 52
2 2 2 2
Similarly P( X = 1) = P( X = 1, Y = 0) + P( X = 1, Y = 1) + P( X = 1, Y = 2)
84
= Sum of the 3 probabilities in 2nd col.
4 44 4 4 4 48
1 1 1 1 1 1
= + +0= (Verify!)
52 52 52
2 2 2
P ( X = 2 ) = P( X = 2, Y = 0 ) + P ( X = 2, Y = 1) + P( X = 2, Y = 2 )
= Sum of the 3 probabilities in 3rd col

4 4
2 2
= +0+0=
52 52
2 2
The distribution of X derived from the joint distribution of X and Y is referred to as the
marginal distribution of X..
Similarly the marginal distribution of Y are the 3 row totals.
Example 17
The joint prob distribution of X and Y is given by
x
-1 0 1
1 1 1 3
-1
8 8 8 8
y
1 1 2
0 0
8 8 8
1 1 1 3
1
8 8 8 8
3 2 3
Marginal Distribution of X
8 8 8
Write the marginal distribution of X and Y. To get the marginal distribution of X, we find
the column totals and write them in the (bottom) margin. Thus the (marginal) distribution
of X is
X -1 0 1
Prob 3 2 3
8 8 8
85
(Do you see why we call it the marginal distribution)
Similarly to get the marginal distribution of Y, we find the 3 row totals and write them in
the (right) margin.
Thus the marginal distribution of y is
Y Prob
-1 3
8
0 2
8
1 3
8
Notation: If f ( x, y ) = P( X = x, Y = y ) is the joint prob distribution of the 2-dimensional

discrete r.v (X.Y), we denote by g (x) the marginal distribution of X and by h(y) the
marginal distribution of Y.
Thus g ( x ) = P ( X = x ) = 1 P( X = x, Y = y ) = 1 f ( x, y )
1 1
All y all y
h( y ) = P(Y = y ) = P ( X = x, Y = y ) = f ( x, y )
1 1
And 1 1
all x all x
Conditional Distribution
The conditional prob distribution of Y for a given X = x is defined as
h( y x ) = P (Y = y X = x ) (read prob of Y = y given X = x)
P ( X = x, Y = y ) f ( x , y )
= =
P( X = x ) g (x )
where g (x) is the marginal distribution of X.
Thus in the above example 17,
P( X = 1, Y = 0 ) 1 1
h(0 | 1) = P (Y = 0 | X = 1) = = 8
=
P( X = 1) 3
8 3
Similarly, the conditional prob distribution of X for a given Y = y is defined as
86
P ( X = x, Y = y ) f ( x, y )
g ( x | y ) = P( X = x | Y = y ) = =
P(Y = y ) h( y )
Where h(y) is the marginal distribution of Y.
In the above example,
P (Y = 0, y = 0 ) 0
g (0 | 0 ) = P ( X = 0 | Y = 0 ) = = =0
P (Y = 0 ) 2
8
Independence
We say X,Y are independent if
P( X = x, Y = y ) = P( X = x )P(Y = y ) for all x, y.
Thus X,Y are independent if and only if
f ( x, y ) = g ( x )h( y ) for all x and y
which is the same as saying of g(x|y) =g(x) for all x and y which is the same as saying
h( y | x ) = h( y ) for all x,y.
In the above example X,Y are not independent as P( X = 0, Y = 0) ≠ P( X = 0)P(Y = 0)
Example 18
The joint prob distribution of X and Y is given by
X
2 0 1
Y 2 0.1 0.2 0.1
0 0.05 0.1 0.15
1 0.1 0.1 0.1
(a) Find the marginal distribution of x.
Ans
X 2 0 1
Prob 0.25 0.4 0.35
87
(b) Find the marginal distribution of Y
Ans
Y Prob
2 0.4
0 0.3
1 0.3
(c) Find P( X + Y = 2)
Ans X + Y = 2 if ( X = 2, Y = 0) or ( X = 1, Y = 1) or ( X = 0, Y = 2)
Thus P( X + Y = 2 ) = 0.05 + 0.1 + 0.2 = 0.35
(d) Find P( X − Y = 0)
Ans : X − Y = 0 if ( X = 2, Y = 2) or ( X = 0, Y = 0) or ( X = 1, Y = 1)
∴ P( X − Y = 0) = 0.1 + 0.1 + 0.1 = 0.3
(e) Find P( X ≥ 0) Ans. 1

0 .3
(f) Find P ( X − Y = 0 X ≥ 0 ) Ans. = 0 .3
1
0 .2 1
(g) Find P ( X − Y = 0 X ≥ 1) Ans. =
0 .6 3
(h) Are X,Y independent?
Ans No! P( X = 1, Y = 1) ≠ P( X = 1) P(Y = 1).
Two-Dimensional Continuous Random Variables
Let (X,Y) be a continuous 2-dimensional r.v. This means (X,Y) can take all values in a
certain region of the X,Y plane. For example, suppose a dart is thrown at a circular board
of radius 2. Then the position where the dart hits the board (X,Y) is a continuous two
dimensional r.v as it can take all values (x,y) such that x 2 + y 2 ≤ 4.
A function f ( x, y ) is said to be the joint prob density of (X,Y) if
(i) f ( x, y ) ≥ 0 for all x, y
88
∞ ∞
(ii) f (x, y )dy dx = 1
−∞ − ∞
b d
(iii) P(a ≤ X ≤ b, c ≤ Y ≤ d ) = f ( x, y )dy dx.
a c
Example 19(a)
Let the joint prob density of (X,Y) be
1
f ( x, y ) = 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
4
0 elsewhere
Find P( X + Y ≤ 1)
Ans : The region x + y ≤ 1 is given by the shaded portion.
1 1− x
1
∴ P (x + y ≤ 1) dy dx
x = 0 y =0
4
1 1
1
= (1 − x ) dx = − 1 (1 − x )2 1
= .
0 4 8 0 8
Example 19(b)
The joint prob density of (X,Y) is
1
f ( x, y ) = (6 − x − y ) 0 < x < 2, 0 < y < 4
8
Find P ( X < 1, Y < 3)
Solution
1 3
f (x, y )dy dx
x =0 y = 2
89
1 3
1
= (6 − x − y )dy dx
x=0 y =2 8
1 2 3
1
(6 − x ) y − y dx
x =0 8 2 2
1
1
= (6 − x ) − 5 dx
x=0
8 2
1
1 (6 − x )
2
5 1 25 5 3
= − − = − − + 18 =
8 2 2 0
8 2 2 8
Marginal and Conditional Densities
If f ( x, y ) is the joint prob density of the 2-dimensional continuous rv (X,Y), we define

the marginal prob density of X as
∞
g (x ) = f ( x, y )dy
−∞
That is fix x and integrate f(x,y) w.r.t y
Similarly the marginal prob density of Y is
∞
h( y ) = f ( x, y )dx
−∞
The conditional prob density of Y for a given x is
f ( x, y )
h( y | x ) = (Defined only for those x for which g(x) ≠ 0)
g (x )
The conditional prob density of X for a given y is
f ( x, y )
g (x | y ) = (defined only for those y for which h( y ) ≠ 0)
h( y )
90
Independence
We say X,Y are independent if and only if f ( x, y ) = g ( x )h( y )
which is the same as saying g ( x | y ) = g ( x ) or h( y | x ) = h( y ).
Example 20
Consider the density of (X,Y) as given in example 19.
The marginal density of x
4
1
= g (x ) = (6 − x − y )dy
y =2
8
4
1 y2
(6 − x ) y −
8 2 2
1
= [2(6 − x ) − 6] 0 < x < 2
8
and = 0 elsewhere
We verify this is a valid density.
1
g (x ) = (6 − 2 x ) ≥ 0 for 0 < x < 2
8
2 2
1
Secondly g ( x )dx = (6 − 2 x )dx
0
8 0
=
1
8
[6x − x 2 ]
2
0 =
1
8
[12 − 4] = 1
The marginal density of Y is
2
1
h( y ) (6 − x − y )dx
x =0
8
2
1 x2 1
= (6 − y )x − = [2(6 − y ) − 2]
8 2 x =0
8
91
1
(10 − 2 y ) or < y < 4
= 8
0 elsewhere
4
Again h( y ) ≥ 0 and h( y )dy
2
[ ]
4
1
= (10 − 2 y )dy = 1 10 y − y 2 4
2 =
1
[20 − 12] = 1
82 8 8
The conditional density of Y for X = 1
1
f ( x, y ) 8
(6 − 1y ) 1
is h( y | 1) = = = (5 − y ), 2 < y < 4
g (1) 1
(6 − 2) 4
8
And 0 elsewhere
Again this is a valid density as h( y | 1) ≥ 0
4 4
1
And h( y | 1)dy = (5 − y )dy
2 4 2
4
1 (5 − y )
2
1 9 1
= − = − =1
4 2 2
4 2 2
P( X < 1, Y < 3)
P ( x < 1 | Y < 3) =
P (Y < 3)
3
Now Nr =
8
3 3
1
Dr = P(Y < 3) = h( y )dy = (10 − 2 y )dy
2
82
3
(5 − y )dy = 1 − (5 − y )
3 2
1 1 9 4 5
= = − =
42 4 2 2
4 2 2 8
92
The conditional density of Y for X = 1
1
f (1, y ) 8
(6 − 1y ) 1
Is h( y | 1) = = = (5 − y ) 2 < y < 4
g (1) 1
(6 − 2) 4
8
And 0 elsewhere
4 4
1
Again this is a valid density as h( y | 1) ≥ 0 and h( y | 1)dy = (5 − y )dy
2 4 2
1 (5 − y )
2 4
1 9 1
= − = − =1
4 2 2
4 2 2
P ( x < 1, y < 3)
P ( X < 1 | Y < 3) =
P (Y < 3)
3
Now Numerator =
8
3 3
1
Denominator = P (Y < 3) = h( y )dy = (10 − 2 y )dy
2
8 2
(5 − y )dy = 1 − (5 − y )
2 3
3
1 1 9 4 5
= = − =
4 2
4 2 2
4 2 2 8
3
3
Hence P ( X < 1, Y < 3) = 8 =
5 5
8
The Cumulative Distribution Function
Let f ( x, y ) be the joint density of (X,Y). We define the cumulative distribution function
as
F ( x, y ) = P ( X ≤ x , Y ≤ y )
x y
= f (u , v )dvdu.
− ∞ −∞
93
The joint prob density of X and Y is given by
f ( x, y ) =
6
5
(x + y ) 2
0 < x < 1, 0 < y < 1
0 elsewhere
Find the cumulative distribution function F(x,y)
Solution
Case (i) x<0
x y
F ( x, y ) = f (u , v )dvdu
−∞ −∞
= 0 as f (u , v ) = 0 for
any u , v < 0
Case (ii) y < 0.
Again F ( x, y ) = 0 whatever be x.
Case (iii)
(0 < x < 1, 0 < y < 1)
y
F ( x, y ) = f (u , v )dvdu
−∞
x y
= 6
5
(u + v )dvdu (as f (u, v ) = 0 for u < 0 or v < 0)
2
u =0 v = 0
x y
6 v3
= uv + du
5 u =0 3 0
x
6 y3 6 x 2 y xy 3
= uy + du = + .
5 u =0 3 5 2 3
94
Case (iv) 0 < x < 1, y ≥ 1
x y
F ( x, y ) = f (u , v ) dv du
−∞ −∞
x 1
=
6
5
(
u + v 2 dv du)
u =0 v = 0
x
6 1 6 x2 x
= u + du = +
5 u =0 3 5 2 3
Case (v) x ≥ 1, 0 < y < 1
as in case (iii) we can show
6 y y3
F ( x, y ) = +
5 2 3
Case (v) x ≥ 1, y ≥ 1
x y 1 1
F ( x, y ) = f (u , v )dv du =
6
5
( )
u + v 2 dvdu
−∞ − ∞ u =0 v = 0
1
6 1 6 1 1
= u + du = + =1
5 u −0 3 5 2 3
(Did you anticipate this?)
Hence
P(0.2 < X < 0.5, 0.4 < Y < 0.6)
= F (0.5,0.6 )
− F (0.2,0.6 ) − F (0.5, 0.4 )
+ F (0.2,0.4 ) (Why ?)
95
6 (.5) (0.6 ) (0.5)(0.6 ) (0.2 ) (0.6 ) (0.2 )(0.6 )
2 3 2 3
= + − −
5 2 3 2 3
−
(0.5) (0.4 ) (0.5)(0.4 ) (0.2 ) (0.4 ) (0.2 )(0.4 )
2
−
3
+
2
+
3
2 3 2 3
(0.5)2 × 1 + (0.5) (0.6)3 − (0.4)3 − (0.2)2 × .1 − (0.2) (0.6) − (0.4)

( )
2
6
=
5 3 3
=
6
5
[[ ] [
(0.5 )2 − (0.2 )2 × 0.1 + (0.1) (0 .6 )3 − (0.4 )3 ]]
=
6
5
[
× 0 .1 (0 .5 ) − (0 .2 ) + (0 .6 ) − (0 .4 )
2 2 3 3
]
6
= × 0 .1 × [0 .362 ]
5
= 0 .04344
Example 22
The joint density of X and Y is
f ( x, y ) =
6
5 (x + y )2
0 < x < 1, 0 < y < 1
0 elsewhere
(a) Find the conditional prob density g (x | y)

1
(b) Find g x |
2
1
(c) Find the mean of the conditional density of X given that Y =
2
Solution
f ( x, y )
g (x | y ) = where h( y ) is the marginal density of y.
h( y )
96
Thus
1 1
h( y ) = f ( x, y )dx = 6
5 (x + y )dx
2
x =0 x =0
6 1
= + y 2 0 < y < 1.
5 2
Hence
g (x | y ) =
6
5
(x + y ) = x + y
2 2
, 0 < x < 1.
6
5
( +y ) +y
1
2
2 1
2
2
(and 0 elsewhere )
1 x+ 1 4 1
∴g x | = 1 14 = x+ , 0 < x <1
2 2
+4 3 4
Hence
1
E x| y=
2
1
1
= x g x| dx
0 2
1
4 1
= × x + dx
0 3 4
1
4 x3 x2 4 1 1 11
= + = + =
3 3 8 0
3 3 8 8
97
Example 23
(X,Y) has a joint density which is uniform on the rhombus find
(a) Marginal density of X.

(b) Marginal density of Y
1
(c) The conditional density of Y given X =
2
Solution
(X,Y) has uniform density on the rhombus means
1
f (x , y ) =
Area of the r hom bus
1
= over the r hom bus
2
and 0 elsewhere
(a) Marginal Density of X
Case (i) 0<x<1
1− x
1
f (x ) = dy = (1 − x )
y = x −1
2
Case (ii) –1<x<0
1+ x
1
f (x ) = dy = 1 + x
y = −1− x
2
Thus
1 + x −1 < x < 0
g (x ) = 1 − x 0 < x <1
0 elsewhere
(b) By symmetry marginal density of Y is
98
1+ y −1 < y < 0
h( y ) = 1 − y 0 < y <1
0 elsewhere
1 1 1
(c) for x = , y ranges from − to
2 2 2
1
Thus conditional density of Y for X = is
2
f (x , 12 ) 1 − 12 < y < 12
h (y | 12 ) = =
f ( 12 ) 0 elsewhere
1 2 2
for x = Y rangs from − to
3 3 3
1
3 2 2
2
= − <y<
∴ h (y | 1
3
)= 2
4 3 3
3
0 elsewhere
99
PROPERTIES OF EXPECTATIONS
Let X be a r.v. a,b be constants
Then
(a) E (aX + b ) = a E ( x ) + b
(b) Var (aX + b ) = a 2 Var (X )
If X 1 , X 2 ...... X n are any n rvs,
E(X 1 + X 2 + ....... + X n ) = E(X 1 ) + E(X 2 ) + .... + E(X n )
But if X 1 ,.....X n are n indep rvs then
Var (X 1 + X 2 + ..... + X n ) = Var (X 1 ) + Var (X 2 ) + .... + Var (X n )
In particular if X,Y are independent
Var (X + Y ) = Var(X − Y ) = Var (X ) + Var (Y )
Please note : whether we add X and Y or subtract Y from X, we always must add their
variances.
If X,Y are two rvs, we define their covariance
COV (X, Y ) = E[(X − µ 1 )(Y − µ 2 )]
Where µ 1 = E(X ), µ 2 = E(Y )
Th. If X,Y are indep, E(XY ) = E(X )E(Y ) and COV (X, Y ) = 0
100
Sample Mean
Let X 1 , X 2 .....X n be n indep rvs each having the same mean µ and same variance σ 2 .
We define
X 1 + X 2 + ... + X n
X=
n
X is called the mean of the rvs X 1 .....X n . Please note that X is also a rv.
Theorem
1. ()
E X =µ
2. ()
Var X =
σ2
n
.
Proof
(i) ()
EX =
1
n
[E(X 1 ) + E(X 2 ) + .... + E(X n )]
1 µ + µ + ..... + µ
= =µ
n n times
(2) ()
Var X =
1
n2
[Var (X 1 ) + Var(X 2 ) + .... + Var(X n )]
(as the variables are independent)
1 σ 2 + σ 2 + .. + σ 2 nσ 2 σ 2
= = =
n2 n times n2 n
101
Sample variance
Let X 1 ...X n be n indep rvs each having the same mean µ and same variance σ 2 . Let
X1 + X 2 + X n
X= be their sample mean. We define the sample variance as
n
1
(X )
n 2
S2 = i −X
n −1 i =1
Note S 2 is also a r.v.
( )
E S2 =σ 2
Proof. Read it on page 179.
Simulation
To simulate the values taken by a continuous r.v. X, we have to use the following
theorem.
Theorem
Let X be a continuous r.v. with density f(x) and cumulative distribution function F(x). Let
U = F ( X ) . Then U is a r.v. having uniform distribution on (0,1).
In other words, U is a random number. Thus to simulate the value taken by X, we take a
random no U from the table 7 (Now you must put a decimal point before the no) And
solve for X, the equation
F (X ) = U
102
Example 24
Let X have uniform density on (α , β ) . Simulate the values of X using the 3-digit random
numbers.
937, 133, 753, 503, …..
Solution
Since X has uniform density on (α , β ) its density is
1
α <n<β
f (x ) = β −α
0 elsewhere
Thus the cumulative distribution function is
0 x ≤α
F (x ) = x −α
β −α α <x≤β
1 x>β
X −α
F(X ) = means =
β−α
_ _
∴X = α + (β − α )
Hence if = .937, X = α + (β − α ).937
= .133, X = α + (β − α ).133
etc.
Let x have exponential density (with parameter β )
x
−
f (x ) =
1
β
e β
x>0
0 elsewhere
103
Hence the cumulative distribution function is
0 x≤0
F(x ) = − xp
1− e x>0
Thus solving F(X ) = U, (ie) 1 − e

− βx
= U for X, we get
1
X = β ln
1− U
Since U is a random number implies 1-U is also a random number, we can as well use the
formula
1
X = β ln
U
= −β ln U.
Example 25
X has exponential density with parameter 2. Simulate a few values of X.
Solution
The defining equation for X is
X = −2 ln
Taking 3 digit random numbers form table 7 page 595 row 21 col. 3, we get the random
numbers : 913, 516, 692, 007 etc.
The corresponding X values are :
− 2 ln(.913), − 2 ln(.516), − 2 ln (.692 )..........
104
Example 26
The density of a rv X is given by
f (x ) = x − 1 < x < 1
= 0 elsewhere
Simulate a few values of X.
Solution
First let us find the cumulative distribution function F(x).
Case (i) x ≤ 1 In this case F(x) = 0
Case (ii) − 1 < x ≤ 0.

x x
F (x ) = f (t )dt = t dt
−∞ −1
x
1− x2
= − t dt =
−1
2
Case (iii) 0 < x ≤1
x
In this case F ( x ) = f (t )dt
−∞
−1 0 x
= 0 dt + − t dt + tdt
−∞ −1 0
1 x2 1+ x2
= 0+ + =
2 2 2
105
Case (iv) x>1. In this case F(x) =1
Thus
0 x ≤ −1
F(x ) =
2
1− x
2
−1 < x ≤ 0
1+ x 2
2
0 < x ≤1
1 x >1
To simulate a value for X, we have to solve the equation F(x) = U for X
1
Case (i) 0 ≤U <
2
In this case we use the equation
1− x 2
F(x ) = = U(why ?)
2
∴ X = − 1 − 2 U (why ?)
1
Case (ii) ≤U <1
2
In this case we solve for X, the equation
1+ X 2
F(X ) = =U
2
∴ X = + 2U − 1
Thus the defining conditions are :
1
If 0 ≤ U < , X = − 1 − 2U
2
and
1
If ≤ U < 1, x = + 2U − 1
2
106
Let us consider the 3 digit random numbers on page 594 Row 17 Col. 5
726, 282, 272, 022,…….
1
U = .726 ≥ Thus X = + 2 × .726 − 1 = 0.672
2
1
U = .281 < Thus X = − 21 − 2 × .281 = − 0.662
2
Note : Most of the computers have built in programs which generate random deviates
from important distributions. Especially, we can invoke the random deviates from a
standard normal distribution. You may also want to study how to simulate values from a
standard normal distribution by Box-Muller-Marsaglia method given on page 190 of the
text book.
Example 27
Suppose the no of hours it takes a person to learn how to operate a certain machine is a
random variable having normal distribution with µ = 5.8 and σ = 1.2. Suppose it takes
two person to operate the machine. Simulate the time it takes four pairs of persons to
learn how to operate the machine. That is, for each pair, calculate the maximum of the
two learning times.
Solution
We use Box-Muller-Marsaglia Method to generate pairs of values z1 , z 2 taken by a

standard normal distribution. Then we use the formula
x1 = µ + σz1
x 2 = µ + σz 2
to simulate the time taken by a pair of persons.
(where µ = 5.8, σ = 1.2 )
We start with the random numbers from Table 7

107
Page 593, Row 19, Column 4
729, 016, 672, 823, 375, 556, 424, 854
Note
z 1 = − 2 ln (u 2 ) Cos (2πµ 1 )
z 2 = − 2 ln u 2 Sin (2πu 1 )
The angles are expressed in radians.
U1 U2 Z1 Z2 X1 X2
.729 .016 -0.378 -0.991 5.346 4.611
etc.
Review Exercises
5.108. If the probability density of a r.v. X is given by
f (x ) =
(
k 1− x 2 ) 0 < x <1
0 elsewhere
Find the value of k and the probabilities
(a) P(0.1 < X < 0.2)
(b) P(X > 0.5)
Solution
∞ 1
f ( x )dx = 1 gives w s ( )
k 1 − x 2 dx = 1
−∞ 0
1
or k 1 − =1
3
3
∴k =
2
108
The cumulative distribution function F(x) of X is:
Case (i) x ≤ 0 ∴ F (x ) = 0
x
Case (ii) 0 < x ≤ 1 , F(x ) = ( )
k 1 − t 2 dt
0
3 x3
= x− .
2 3
Case (iii) x > 1. F(x ) = 1
∴P(0.1 < X < 0.2 ) = F(0.2) − F(0.1)
(0.2)2 − (0.2) (0.1) − (0.1)

3 3
3 3
= −
2 3 2 3
P(X < 0.5) = 1 − P(X ≤ 0.5)
(0.5) − (0.5)
3
3
= 1 − F(0.5) = 1 −
2 3
5.113: The burning time X of an experimental rocket is a r.v. having the normal
distribution with µ = 4.76 sec and σ = 0.04 sec . What is the prob that this kind of rocket
will burn
(a) <4.66 Sec
(b) > 4.80 se
(c) anywhere from 4.70 to 4.82 sec?
Solution
X − µ 4.66 − 4.76
(a) P(X < 4.66 ) = P <
σ 0.04
= P(Z < −0.25) = 1 − P(Z < 0.25)
= 1 − F (0.25) = 1 − 0.5987 = 04013
109
X − µ 4.80 − 4.76
(b) P(X > 4.80 ) = P >
σ 0.04
= P(Z > 1) = 1 − F (1) = 1 − 0.8413 = 0.1587
(c) P(4.70 < X < 4.82)
4.70 − 4.76 X − µ 4.82 − 4.76

=P < <
0.04 σ 0.04
= P(− 1.5 < Z < 1.5)

= 2F(1.5) − 1 = 2 × 0.9332 − 1 = 0.8664
5.11 The prob density of the time (in milliseconds) between the emission of beta particles
is a r.v. X having the exponential density
0.25e −0.25 x>0

f (x ) =
0 elsewhere
Find the probability that
(a) The time to observe a particle is more than 200 microseconds (=200x 10-3
milliseconds)
(b) The time to observe a particle is < 10 microseconds
Solution
(a) (
P(> 200 micro sec ) = P X > 200 × 10 −3 milli sec )
∞
= [
0.25e −0.25 x dx = − e − 0.25 x ]
∞
200×10 − 3
−3
200×10
−3
= e −50×10
110
(b) P(X < 10 micro sec onds ) = P X < 10 × 10 −3 ( )
10×10 −3
= [
0.25 e − 0.25 x dx = − e −0.25b ] 10×10 − 3
0
0
−3
= 1 − e − 2.5×10
5.120: If n sales people are employed in a door-to-door selling campaign, the gross sales
volume in thousands of dollars may be regarded as a r.v. having the Gamma distribution
1
with α = 100 n and β = . If the sales costs are $5,000 per salesperson, how many
2
sales persons should be employed to maximize the profit.
Solution
For a Gamma distribution µ = αβ = 50 n . Thus (in thousands of dollars) the “average”

profit when n persons are employed.
= T = 50 n − 5n (5 x 1000 per person is the cost per person)
This is a maximum (using calculus) when n = 25.
5.122: Let the times to breakdown for the processors of a parallel processing machine
have joint density
0.04e −0.2 x −0.2 y x > 0, y > 0

f ( x, y ) =
0 elsewhere
where X is the time for the first processor and Y is the time for the 2nd processor. Find
(a) The marginal distributions and their means

(b) The expected value of the sum of the X and Y.
(c) Verify that the mean of a sum is the sum of the means.
111
Solution
(a) Marginal density of X
∞ ∞
= g (x ) = f (x , y )dy = 0.04e − 0.2 x − 0.2 y dy
y = −∞ y =0
∞
− 0.2 x
= 0 .2 e 0.2e − 0.2 y dy = 0.2e − 0.2 x , x > 0
y=0
(and = 0 if x ≤ 0 )
By symmetry, the marginal distribution of Y is
0.2e −0.2 y y>0

h( y ) =
0 elsewhere
1
Since X (& Y) have exponential distributions (with parameters = 5 ) E(X)
0 .2
= E(Y) = 5.
E since f(x,y) = g (x) h (y), X,Y are independent.
∞ ∞
E(X + Y ) = (x + y ) f (x, y )dydx
−∞ −∞
∞ ∞
= (x + y )(0.04)e −0.2 x −0.2 y dydx
x =0 y = 0
∞ ∞
= x.0.04e − 0.2 x − 0.02 y dydx
x =0 y = 0
112
∞ +∞
+ y × 0.04e − 0.2 x − 0.2 y dydx
x =0 y =0
= 5 + 5 = 10 (verify!)
= E(X ) + E(Y )
5.123: Two random variable are independent and each has binomial distribution with
success prob 0.7 and 2 trials.
(a) Find the joint prob distribution.

(b) Find the prob that the 2nd variable is greater than the first.
Solution
Let X,Y be independent and have Binomial distribution with parameters n = 2, and
p = 0.7 Thus
2
P(X = k ) = (0.7 )k (0.3)2− k k = 0,1,2
k
2
P(Y = r ) = (0.7 )r (0.3)2− r r = 0,1,2
r
∴ P(X = k , Y = r ) = P(X = k )P(Y = r ) as X, Y are independent.
2 2
= (0.7 )k + r (.3)4−(k + r )
k r
0 ≤ k, r ≤ 2
113
(b) P(Y > X )
= P(Y = 2, X = 0 or1) + P(Y = 1, X = 0 )
2 2 2
= (0.7 )2 (.3)0 (0.7 )0 (0.3)2 + (0.7 )1 (0.3)1
2 0 1
2 2
+ (0.7 )1 (0.3)1 (0.7 )0 (0.3)2
1 0
5.124 If X1 has mean – 5, variance 3 while X2 has mean 1 and variance 4, and the two are
independent, find
(a) E(3X 1 + 5X 2 + 2)
(b) Var (3X 1 + 5X 2 + 2)
Ans:
(a) 3 (− 5) + 5(1) + 2 = −8
(b) 9 × 3 + 25 × 4 = 127
114
Sampling Distribution
Statistical Inference
Suppose we want to know the average height of an Indian or the average life length of a
bulb manufactured by a company, etc. obviously we cannot burn out every bulb and find
the mean life length. One chooses at random, say n bulbs, find their lifelengths
X + X 2 + .... + X n
X 1 , X 2 ..... X n and take the mean life length X = 1 as an ‘approximation’
n
to the actual (unknown) mean life length. Thus we make a statement about the
“population” (of all life lengths) by looking at a sample of it. This is the basis behind
statistical inference. The whole theory of statistical inference tells us how close we are to
the true (unknown) characteristic of the population.
Random Sample of size n
In the above example, let X be the lifelength of a bulb manufactured by the company.
Thus X is a rv which can assume values > 0. It will have a certain distribution and a
certain mean µ etc. When we make n independent observations, we get n values
x1 , x 2 ....x n . clearly if we again take n observations, we would get y1 , y 2 .... y n . Thus we
may say
Definition
Let X be a random variable. A random sample of size n from x is a finite ordered

sequence {X 1 , X 2 ...., X n }of n independent rv3 such that each Xi has the same
distributions that of X.
Sampling from a finite population
Suppose there is an universe having a finite number of elements only (like the number of
Indians, the number of females in USA who are blondes etc.). A sample of size n from
the above is a subset of n elements such that each subset of n elements has the same prob
of being selected.
115
Statistics
Whenever we sample, we use a characteristic of the sample to make a statement about the
population. For example suppose the true mean height of an Indian is µ (cms). To make a
statement about µ , we randomly select n Indians, Find their heights {X 1 , X 2 ...., X n }and
then their mean namely
X 1 + X 2 + ..... + X n
X=
n
We use then X as an estimate of the unknown parameter µ . Remember µ is a

parameter, a constant that is unchanged. But the sample mean X is a r.v. It may assume
different values depending on the sample of n Indians chosen.
Definition : Let X be a r.v. Let {X1 , X 2 .....X n } be a sample of size n from X. A statistic
is a function of the sample {X 1 , X 2 ,...., X n }.
Some Important Statistics
X 1 + X 2 + ..... + X n
1. The sample mean X =
n
2. The sample Variance S 2 =

1 n
n − 1 i =1
(
Xi − X )
2
3. The minimum of the sample K = min {X 1 , X 2 ,...., X n }
4. The maximum of the sample M = max {X 1 , X 2 ,......X n }.
5. The Range of the sample R = M − K
Definition
∧ ∧
If X 1 ,.....X n is a random sample of size n and if X is a statistic, then we remember X is
∧
also a r.v. Its distribution is referred to as the sampling distribution of X .
116
The Sampling Distribution of the Sample Mean X .
Suppose X is a r.v. with mean µ and variance σ 2 . Let X 1 , X 2 .....X n be a random sample
X 1 + X 2 + ........ + X n
of size n from X. Let X = be the sample mean. Then
n
(a) ( )
E X = µ.
(b) ( )
VX =
σ2
n
.
(c) If X 1 ....X n is a random sample from a finite population with N elements, then
Var X =( ) σ2 N − n
n N −1
.
(d) If X is normal, X is also normal
X −µ
(e) Whatever be the distribution of X, if n is “large” has approximately the
σ
n
standard normal distribution. (This result is known as the central limit theorem.)
Explanation
(a) tells us that we can “expect” the sample mean X to be an approximation to

the population mean µ .
(b) tells us that the “nearness” of X to µ is small when the sample size n is
large.
X −µ
(c) says that if X has a normal distribution. has a standard normal
σ
n
distribution.
X −µ
(d) says that whatever be the distribution of X, discrete or continuous,
σ
n
has approximately standard normal distribution if n is large.
117
Example 1 (See exercise 6.14, page 207)
The mean of a random sample of size n = 25 is used to estimate the mean of an infinite
population with standard deviation σ = 2.4. What can we assert about the prob that the
error will be less than 1.2 if we use
(a) Chebyshev’s theorem

(b) The central limit theorem?
Solution
(a) We know the sample mean X is a rv with E X = µ and Var X =( ) ( ) σ2

n
Chebyshev’s theorem tell us that for any r.v. T,
(
P | T − E (T ) | k Var(T ) ≥ 1 − ) 1
k2
Taking T = X, and noting E (T ) = E X = µ, ( )

σ 2 (2.4 )
( )
2
var(T ) = var X = = , we find
n 25
2 .4 1
P X − µ < k. ≥ 1− 2 .
5 k
(
Desired P X − µ < 1.2 ? )
2 .4 5
k. =1.2 gives k =
5 2
Thus we can assert using Chebyshev’s theorem that
(
P X − µ < 1 .2 ≥ 1 −) 1
25
=
21
25
= 0.84
4
118
X−µ X−µ
(b) Central limit theorem says σ
= 2.4
is approximately standard normal.
n 5
(
Thus P X − µ < 1.2 )
X−µ 1 .2
=P σ
< 2.4
n 5
5 5
≈P Z < = 2F −1
2 2
= 2 × F(2.5) − 1 = 2 × 0.9938 − 1 = 0.9876
A random sample of size 100 is taken from an infinite population having mean µ = 76
and variance σ 2 = 256. What is the prob that X will be between 75 and 78?
Solution
X−µ
We use central limit theorem namely σ
is approximately standard normal.
n
(
Required P 75 < X < 78 )
75 − 76 X−µ 78 − 76
=P 16
< σ
< 16
10 n 10
10 20 5 5
≈P − <Z< =P − <Z<
16 16 8 4
5 5 5 5
=F −F − =F +F −1
4 8 4 8
= 0.8944 + 0.7340 − 1 = 0.8284
119
If the distribution of weights of all men travelling by air between Dallas and El Paso has
a mean of 163 pounds and a s.d .of 18 pounds, what is the prob. That the combined gross
weight of 36 men travelling on a plane between these two cities is more than 6000
pounds?
Solution
Let X be the weight of a man traveling by air between D and E. It is given that X is a rv
with mean E(X ) = µ = 163 lbs and sd σ = 18 lbs.
Let X 1 , X 2 .....X 36 be the weights of 36 men traveling on a plane between these two cities.
Thus we can regard {X 1 , X 2 ....., X 36 }as a random sample of size 36 from X.
Required P(X 1 + X 2 + ..... + X 36 > 6000 )
6000
=P X>
36
X−µ 1000
− 163
=P σ
> 6
18
by central limit theorem
n 6
22
≈P Z>
18
22
= 1− P Z ≤ = 1 − F (1.22 )
18
= 1 − 0.8888 = 0.1112
120
The sampling distribution of the sample mean X (when σ is unknown).
Theorem
Let X be a rv having normal distribution with mean E(X ) = µ . Let X be the sample
mean and S2 the sample variance of a random sample of size n form that of X.
X−µ
Then the rv. t = S
has (student’s) t-distribution with n-1 degrees of freedom.
n
Remark
(1) The shape of the density curve of t-distribution (with parameter ν -greek nu)
is like that of standard normal distribution and is symmetrical about the y-
axis.
t ν,α is that
unique number such that
P(t > t v,α ) = α

(ν → the parameter)
By symmetry tν ,1−α = 1 − tν ,α
The values of tν ,α for various υ and α are tabulated in Table 4.
For ν large, tν ,α ≈ Z α .
A random sample of size 25 from a normal population has the mean x = 47.5 and the s.d.
s = 8.4. Does this information tend to support or refute the claim that the mean of the
population is µ = 42.1?
121
Solution:
x−µ
t= s
has a t-distribution with parameter ν = n − 1
n
Here µ = 42.1, s = 8.4, n = 25
t n −1,α 0.005 = t 24, 0.005 = 2.797
Thus P(t > 2.797 ) = 0.005
X−µ
Or P s
> 2.797 = 0.005
n
8 .4
Or P X > 42.1 + 2.797 × = 0.005
5
(
Or P X > 46.78 = 0.005 )
This means when µ = 4.21 only in about 0.5 percent of the cases we may get an
X > 46.78 . Thus we will have to refute the claim µ = 42.1 (in favour of µ > 42.1)
The following are the times between six calls for an ambulance (in a certain city) and the
patients arrival at the hospital : 27, 15,20, 32, 18 and 26 minutes. Use these figures to
judge the reasonableness of the ambulance service’s claim that it takes on the average 20
minutes between the call for an ambulance and the patients arrival at the hospital.
Solution
Let X = time (in minutes) between the call for an ambulance and the patient’s arrival at
the hospital. We assume X has a normal distribution. (When nothing is given, we assume
normality). We want to judge the reasonableness of the claim that E(X ) = µ = 20 minutes.
For this we recorded the times for 6 calls. So we have a random sample of size 6 from X
with
122
X 1 = 27, X 2 = 15, X 3 = 20, X 4 = 32, X 5 = 18, X 6 = 26. Thus X = (27 + 15 + 20 + 32 + 18 + 26 ) / 6
138
= = 23.
6
S2 =
1
6 −1
[
(27 − 23)2 + (15 − 23)2 + (20 − 23)2 + (32 − 23)2 + (18 − 23)2 + (26 − 23)2 ]
1
= [16 + 64 + 9 + 81 + 25 + 9] = 204
5 5
204
Hence S =
5
We calculate
x −µ 23 − 20
t= s
= = 1.150
n
204
5 / 6
Now t n −1,α = t 5,α = 2.015 for α = 0.05

= 1.476 for α = 0.10
Since our observed t = 1.150 < t 5.10
We can say that it is reasonable to assume that the average time is µ = 20 minutes
Example 6
A process for making certain bearings is under control if the diameters of the bearings
have a mean of 0.5000 cm. What can we say about this process if a sample of 10 of these
bearings has a mean diameter of 0.5060 cm and sd 0.0040 cm?
X − 0 .5
H int . P − 3.25 < .004
< 3.25 = 0.01
10
(
or P 0.492 < x < 0.504 = 0.01 )
Since X = 0.506 > 0.504,
the process is not under control.
123
Sampling Distribution of S2 (The sample variance)
Theorem
If S2 is the sample variance of a random sample of size n taken from the normal
population with (population) variance σ 2 , then
S2 1
(X )
n
Χ 2 = (n − 1)
2
= 2 i −X
σ 2
σ i =1
is a random variable having chi-square distribution with parameter ν = n − 1.
Remark
Since S2 > 0, the rv has +ve density only to right of the origin. Χ ν2 ,α is that unique
( )
number such that P Χ 2 > Χ ν2 ,α = α and is tabulated for some α s and ν s in table 5.
A random sample of 10 observations is taken from a normal population having the

variance σ 2 = 42.5 . Find approximately the prob of obtaining a sample standard
deviation S between 3.14 and 8.94
Solution
Required P(3.14 < S < 8.94)
=p ( (3.14)2 < S 2 < (8.94)2 )

=P
9
× (3.14 ) <
2 (n − 1) S 2 < 9
× (8.94 )
2
42.5 σ 2
42.5
(
= P 2.088 < Χ < 16.925
2
)
(From Table 5, Χ 92 05 = 16.919, Χ 92 , 0.99 = 2.088 )
( ) (
= P Χ 2 > 2.088 − P Χ 2 > 16.919 (approx ) )
= 0.99 − 0.05 = 0.94 (approx)
124
The claim that the variance of a normal population is σ 2 = 21.3 is rejected if the
variance of a random sample of size 15 exceeds 39.74. What is the prob that the claim
will be rejected even though σ 2 = 21.3 ?
Solution
The prob that the claim is rejected
(
= P S 2 > 29.74 )
=P
(n − 1) S 2 > 14
(
× 39.74 = P Χ 2 > 21.12 )
σ 2
21.3
(
= 0.025 As from table 5, Χ14
2
, 0.025 = 21.12 )
Theorem
If S12 , S 22 are the variances of two independent random samples of sizes n1 , n2

respectively taken from two normal populations having the same variance, then
S12
F= 2
S2
is a rv having the (Snedecor’s) F distribution with parameters ν 1 = n1 − 1 and ν 2 = n2 − 1
Remark
1. n1 − 1 is called the numerator degrees of freedom and n2 − 1 is called the

denominator degrees of freedom.
2. If F is a rv having (ν 1 ,ν 2 ) degrees of freedom, then Fν1 ,ν 2 ,α is that unique number

such that
125
( )
P F > Fν 1ν 2 ,α = α and is tabulated for α = 0.05 in table 6(a) and for α = 0.01 in table
6(b).
1
We also note the fact : Fν 2 ,ν 2 ,α =
Fν1 ,ν 2 ,1−α
1 1
Thus F10, 20,0.95 = = = 0.36
F20,10, 0.05 2.77
Example 9
1 1
(a) F12,15, 0.95 = = = 0.38
F15,12,0.05 2.62
1 1
(b) F6, 20, 0.99 = = = 0.135
F20, 6, 0.01 7.40
Example 10 (See Exercise on page 213)
If independent random samples of size n1 = n2 = 8 come from two normal populations

having the same variance, what is the prob that either sample variance will be at least
seven times as large as the other?
Solution
Let S12 , S 22 be the sample variances of the two samples.
(
Reqd P S12 > 7S 22 OR S 22 > 7S12 )
S12 S 22
=P > 7 or >7
S 22 S12
= 2 P (F > 7 )
where F is a rv having F distribution with (7,7) degrees of freedom

= 2 x 0.01 = 0.02 (from table 6(b)).
126
Example 11 (see exercise 6.38 on page 215)
If two independent random samples of size n1 = 9 and n2 = 16 are taken from a normal
population, what is the prob that the variance of the first sample will be at least four times
as large as the variance of the second sample?
(
Hint : Reqd prob = P S12 > 4S 22 )
S12
=P > 4 = P(F > 4 )
S 22
= 0.01 (as F8,15, 0.01 = 4 )
The F distribution with (4,4) degrees of freedom is given by
6 F (1 + F )
−4
F >0
f (F ) =
0 F ≤0
If random samples of size 5 are taken from two normal populations having the same
variance, find the prob that the ratio of the larger to the smaller sample variance will
exceed 3?
Solution
Let S12 , S 22 be the sample variance of the two random samples.

(
Reqd P S12 > 3S 22 or S 22 > 3S12 )
S12
= 2 P 2 > 3 = 2 P ( F > 3)
S2
where F is a rv having (4,4) degrees of freedom
127
∞ ∞
6F 1 1
=2 dF = 12 − dF
3 (1 + F) 4
3 (1 + F)3
(1 + F)4
1 1
= 12 − +
2(1 + F) 3(1 + F)
2 3
1 1 5 × 12 5
= 12 − = =
32 192 192 16
Inferences Concerning Means
We shall discuss how we can make statement about the mean of a population from the
knowledge about the mean of a random sample. That is we ‘estimate’ the mean of a
population based on a random sample.
Point Estimation
Here we use a statistic to estimate the parameter of a distribution representing a

population. For example if we can assume that the lifelength of a transistor is a r.v.
having exponential distribution with (unknown) parameter β , β can be estimated by
some statistic, say X the mean of a random sample. Or we may say the sample mean is
an estimate of the parameter β .
Definition
∧
Let θ be a parameter associated with the distribution of a r.v. A statistic θ (based on a
random sample of size n) is said to be an unbiased estimate ( ≡ estimator) of θ if
∧ ∧
E θ = θ . That is, θ will be on the average close to θ .
Example
Let X be a rv; µ the mean of X. If X is the sample mean then we know E X = µ . Thus ( )
we may say the sample mean X is an unbiased estimate of µ (Note X is a rv, a
X 1 + X 2 + ..... + X n
statistic, X= a function of the random sample
n
128
(X1 , X 2 ....., X n ). If ω1 , ω 2 ....ω n are any n non-ve numbers ≤1 such that
ω1 + ω 2 + ...... + ω n = 1, then we can easily see that ω1 x 1 + ω 2 x 2 + ..... + ω n x n is also an
unbiased estimate of µ . (Prove this). X is got as a special case by taking
1
ω1 = ω 2 = .... = ω n = . Thus we have a large number of unbiased estimates for µ .
n
∧ ∧
Hence the question arises : If θ 1 ,θ 2 are both unbiased estimates of θ , which one do we
prefer? The answer is given by the following definition.
Definition
∧ ∧ ∧
Let θ 1 ,θ 2 be both unbiased estimates of the parameter θ . We say θ is more efficient than
∧ ∧ ∧
θ 2 if Var θ1 ≤ Var θ 2 .
Remark
That is the above definition says prefer that unbiased estimate which is “more closer” to
∧
θ . Remember the variance is a measure of the “closeness’ of θ X to θ .
Maximum Error in estimating µ by X
Let X be the sample mean of a random sample of size n from a population with
(unknown) mean µ . Suppose we use X to estimate µ . X - µ is called the error in
estimating µ by X . Can we find an upperbound on this error? We know if X is normal
(or if n is large) then by Cantral Limit Theorem.
X−µ
σ
is a r.v. having (approximately) the standard normal distribution. And we can say
n
X−µ
P − Zα < σ
< Zα = 1− α
2 2
n
129
Thus we can say with prob (1 − α ) that the max absolute error X − µ in estimating µ by
σ
X is atmost Z α . (Here obviously we assume, σ the population s.d. is known. And
2
n
2
(
Z α is that unique no. such that P Z > Z α =
2
) α
2
.
We also say that we can say with 100(1 − α ) percent confidence that the max. abs error is
σ
atmost Z α . The book denotes, this by E.
2
n
Estimation of n
Thus to find the size n of the sample so that we may say with 100(1 − α ) percent
confidence, the max. abs. error is a given quantity E, we solve for n, the equation
σ
Zα = E.
2
n
2
Zασ
or n = 2
Example 1
What is the maximum error one can expect to make with prob 0.90 when using the mean
of a random sample of size n = 64 to estimate the mean of a population with σ 2 = 2.56 ?
Solution
α
Substituting n = 64, σ = 1.6 and Z α = Z 0.05 = 1.645 (Note 1 − α = 0.90 implies = 0.05 )
2 2
σ
in the formula for the maximum error E = Z α we get
2
n
1 .6 1 .6
E = 1.645 × = 1.445 × = 1.645 × 0.2 = 0.3290
64 8
Thus the maximum error one can expect to make with prob 0.90 is 0.3290.
130
Example 2
If we want to determine the average mechanical aptitude of a large group of workers,

how large a random sample will we need to be able to assert with prob 0.95 that the
sample mean will not differ from the population mean by more than 3.0. points? Assume
that it is known from past experience that σ = 200.
Solution
α
Here 1 − α = 0.95 so that = 0.025 , hence Z α = Z 0.025 = 1.96
2 2
Thus we want n so that we can assert with prob 0.95 that the max error E = 3.0
2
Zασ 1.96 × 20
2
∴n = 2
= = 170.74
E 3
Since n must be an integer, we take it as 171.
Small Samples
If the population is normal and we take a random sample of size n (n small) from it, we
note
X −µ
t= s
( X sample mean, S = Sample s.d)
n
is a rv having t-distribution with (n-1) degrees of freedom.

Thus we can assert with prob 1 − α that t ≤ t n −1, α where t n−1, α is that unique no such that
2 2
(
P t > t n−1, α =
2 2
) α
. Thus if we use X to estimate µ , we can assert with prob (1 − α ) that
the max error will be

S
E = t n −1, α
2 n
(Note : If n is large, then t is approx standard normal. Thus for n large, the above
S
formula will become E = Z α )
2
n
131
Example 3
20 fuses were subjected to a 20% overload, and the times it took them to blow had a
mean x = 10.63 minutes and a s.d. S = 2.48 minutes. If we use x = 10.63 minutes as a
point estimate of the true average it takes for such fuses to blow with a 20% overload,
what can we assert with 95% confidence about the maximum error?
Solution
Here n = 20 (fuses) x = 10.63, S = 2.478
95 α
1−α = = 0.95 so that = 0.025
100 2
Hence t n −1, α = t19, 0.025 = 2.093

2
Hence we can assert with 95% confidence (ie with prob 0.95) that the max error will be
S 2.48
E = t n −1, α = 2.093 × = 1.16
2
n 20
Interval Estimation
If X is the mean of a random sample of size n from a population with known sd σ , then
we know by central limit theorem,
X−µ
Z= σ
n
is (approximately) standard normal. So we can say with prob (1 − α ) that

X−µ
− Zα < σ
< Zα .
2 2
n
which can be rewritten as
σ σ
X− Zα < µ<X + Zα
n 2
n 2
132
Thus we can assert with Prob (1 − α ) (≡ ie. with (1 − α ) × 100% confidence ) that µ lies in
σ σ
the interval X − − Zα , X + Zα .
n 2
n 2
We refer to the above interval as a (1 − α )100% confidence interval for µ . The end
σ
points X ± Z α are known as (1 − α )100% . confidence limits for µ .
n 2
Example 4
Suppose the mean of a random sample of size 25 from a normal population (with σ = 2 )
is x = 78.3. Obtain a 99% confidence interval for µ , the population mean.
Solution
79
Here n = 25, σ = 2, (1 − α ) = = 0.99
100
α
∴ = 0.005 ∴Z α = Z 0.005 = 2.575
2 2
x = 78.3
Hence a 99% confidence interval for µ is
σ σ
x − Zα , x + Zα
2
n 2
n
2 2
= 78.3 − 2.575 × , 78.3 + 2.575 ×
25 25
= (78.3 − 1.0300, 78.3 + 1.0300 )
= (77.27, 79.33)
133
σ unknown
Suppose X is the sample mean and S is the sample sd of a random sample of size n taken
from a normal population with (unknown) mean µ . Then we know the r.v.
X−µ
t=
s
n
has a t-distribution with (n-1) degrees of freedom. Thus we can say with prob 1 − α that
− t n −1, α < t < t n−1, α

2 2
X−µ
or − t n −1, α < <t α
2 S n −1,
2
n
S S
or X − t α <µ < X + t α
n −1,
2 n n −1,
2 n
Thus a (1 − α )100% confidence interval for µ is
S S
X−t α ,X + t α
n −1,
2 n n −1,
2 n
Note :
(1) If n is large, t has approx the standard normal distribution. In which case the
(1 − α )100% confidence interval for µ will be
S S
x − Zα , x + Zα
2 n 2 n
(2) If nothing is mentioned, we assume that the sample is taken from a normal
population so that the above is valid.
134
Example 5
Material manufactured continuously before being cut and wound into large rolls must be
monitored for thickness (caliper). A sample of ten measurements on paper, in mm,
yielded
32.2, 32.0, 30.4, 31.0, 31.2, 31.2, 30.3, 29.6, 30.5, 30.7
Obtain a 95% confidence interval for the mean thickness.
Solution
Here n = 10
x = 30.41 S = 0.7880
α
1 − α = 0.95 or = 0.025
2
∴t α = t 9, 0.0025 = 2.262
n −1,
2
Hence a 95% confidence interval for µ is
0.7880 0.7880
30.9 − 2.262 × , 30.9 + 2.262 ×
10 10
= (30.34, 31.46 )
Example 6:
Ten bearings made by a certain process have a mean diameter of 0.5060 cm with a sd of
0.0040 cm. Assuming that the data may be looked upon as a random sample from a
normal population, construct a 99% confidence interval for the actual average diameter of
bearings made by this process.
135
Solution
Here n = 10, x = 0.5060, S = 0.0040
99
(1 − α ) = = 0.99. Hence α = 0.005
100
∴t α = t 9, 0.005 = 3.250
n −1,
2
Thus a 99% confidence interval for the mean
S s
= x −t α , x+t α
n −1,
2 n n −1,
2 n
0.0040 0.0040
= 0.5060 − 3.250 × , 0.5060 + 3.250 ×
10 10
= (0.5019, 0.5101)
Example 7
In a random sample of 100 batteries the lifetimes have a mean of 148.2 hours with a s.d.
of 24.9 hours. Construct a 76.60% confidence interval for the mean life of the batteries.
Solution
Here n = 100, x = 148.2, S = 24.9

76.60 α
1−α = = .7660 so that = 0.1170
100 2
Thus t α = t 99, 0.1170 ≈ Z 0.1170 = 1.19
n −1,
2
Hence a 76.60% confidence interval is

24.9 24.9
148.2 − 1.19 × ,148.2 + 1.19 ×
100 100
= (145.2,151.2 ).
136
Example 8
A random sample of 100 teachers in a large metropolitan area revealed a mean weekly
salary of $487 with a sd of $48. With what degree of confidence can we assert that the
average weekly salary of all teachers in the metropolitan area is between $472 and $502?
Solution
Suppose the degree of confidence is (1 − α ) × 100%
S
Thus x + t α = $502
n −1,
2 n
Here x = 487, S = 48, n = 100
∴t α ≈ Zα
99 ,
2 2
48
Thus we get 487 + Z α = 502
2
10
15
Or Z α = = 3.125
2
4 .8
α
∴ = 0.0009 or 1 − α = 0.9982
2
∴ We can assert with 99.82% confidence that the true mean salaries will be between
$472 and $502.
Maximum Likelihood Estimates (See exercise 7.23, 7.24)
Definition
Let X be a rv. Let f ( x,θ ) = P( X = x ) be the point prob function if X is discrete and let
f ( x,θ ) be the pdf of X if X is continuous (here θ is a parameter). Let X 1 , X 2 .....X n be a
random sample of size n from X. Then the likelihood function based on the random
sample is defined as
137
L (θ) = L(x 1 , x 2 ,....x n ; θ) = f (x 1 , θ)f (x 2 , θ).....f (x n , θ).
Thus the likelihood function L(θ ) = P ( x1 = x1 )P ( x 2 = x 2 )...P( x n = x n ) if X is discrete and

is the joint pdf of X 1 ,...X n when X is continous. The maximum likelihood estimate
∧
(MLE)of θ is that θ which maximizes L(θ ) .
Example 8
Let X be a rv having Poisson distribution with parameter λ .
λx
Thus f (x , λ ) = P(X = x ) = e − λ ; x = 0,1,2.......
x!
Hence the likelihood function is
λx λx λx
L(λ ) = e −λ
1 2 n
e −λ ....e −λ
x1! x2 ! xn !
e − nλ λx1 + x 2 +....+ x n
= ; x i = 0,1,2.......
x 1! x 2 !.....x n !
∧
To find λ the value of λ which maximizes L(λ ) , we use calculus.
First we take ln (log to base e natural logarithm)
ln L(λ ) = − nλ + (x 1 + ..... + x n ) ln λ − ln (x 1!....x n !)
Differentiating w.r.t. λ (noting x1 .....x n are not be varied)
1 ∂L (x + ....xn )
We get = −n + 1
L(λ ) ∂λ λ
x 1 + .... + x n
= 0 gives λ =
n
138
∂2L
We can ‘easily’ verify is <0 for this λ .
∂λ2
∧ x1 + ....x n
Hence the MLE of λ is λ = = x (The sample mean)
n
Example 9 MLE of Proportion
Suppose p is the proportion of defective bolts produced by a factory. To estimate p, we

proceed as follows. We take n bolts at random and calculate fD = Sample proportion of
defectives.
No of defectives found among the n chosen ones

=
n
we show fD ist he MLE of p.
We define a rv X as follows.
0 if the bolt chosen is not defective

X=
1 if the bolt chosen is defective
Thus X has the prob distribution
x 0 1
Prob 1-p p
It is clear that the point prob function
f ( x; p )(of X ) is given by
f (x; p ) = p x (1 − p )
1− x
; x = 0,1
(Note f (x;0 ) = P(x = 0 ) = 1 − p & f (x;1) = P(x = 1) = p)
Choosing n bolts at random amounts to choosing a random sample {X 1 , X 2 ..., X n }from X

where Xi = 0 if the ith bolt chosen is not defective and = 1 if it is defective (I=1,2…n).
139
Hence X 1 + X 2 .... + X n (can you guess?)
= no of defective bolts among the n chosen.

The likelihood function of the sample is
L(p ) = f (x 1 ; p )f (x 2 , p ).....f (x n ; p )
= p x1T ...+ x n (1 − p )
n − ( x1 + x 2 +...+ x n )
x i = 0 or1 for all i = 1,....n
= p (1 − p ) (s = x 1 .... + x n )
s n −s
Taking ln and differentiating (partially) wrt p,
We get
1 ∂L s (n − s )
= −
L ∂p p 1 − p
∂L s n −s
for maximum, = 0 or =
∂p p 1− p
s x 1 + x 2 + ..... + x n
(i.e) p = =
n n
No of defectives among the n chosen

=
n
= Sample proportion of defectives
∂ 2L
(One can easily see this p makes < 0 so that L is maximum for this p).
∂p 2
Example 10
Let X be a rv having exponential distribution with parameter β (unknown). Hence the

x
1 −
density of X is f ( x; β ) = e β
(x > 0)
β
140
Let {X1 , X 2 ....., X n } be a random sample of size n. Hence the likelihood function is
L(β ) = f ( x1 ; β ) f ( x 2 ; β ).... f ( x n ; β )
( x1 + x2 +....+ xn )
1 −
= e β
( xi > 0)
β n
Taking ln and differentiating (partially) w.r.t. β , we get
1 ∂L n x + .... + x n
=− + 1 = 0 (for max imum )
L ∂β β β2
x1 + x 2 + .... + x n
gives β = =x
n
Thus the sample mean x is the MLE of β .
Example 11
A r.v. X has density
f (x; β ) = (β + 1)x β ;0 < x < 1
Obtain the ML estimate of β based on a random sample {X1 , X 2 .....X n } of size n from
x.
Solution
The likelihood function is
L(β ) = (β + 1) (x 1 x 2 ...x n ) ; 0 < x i < 1

n β
Taking ln and differentiating (partially) wrt β , we get
141
1 ∂L n
= + ln (x 1 .......x n )
L ∂β β + 1
= 0 (for L to be max imum )
1
gives β = −1 −
ln (x 1 .....x n )
n
which is the ML estimate for β .
So far we have considered situations where the ML estimate is got by differentiating L

(and equalizing the derivative) to zero. The following example is one where the
differentiation will not work.
Example 12
A rv X has uniform density over [0, β ]
1
(ie) The density of X is f (x; β) = ; 0 ≤ x ≤ β (and 0 elsewhere)
β
The likelihood function based on a random sample of size n from X is
L(β ) = f (x 1 ; β )f (x 2 ; β ).....f (x n ; β )
1
= ; 0 ≤ x 1 ≤ β, 0 ≤ x 2 ≤ β, ....,0 ≤ x n ≤ β
βn
This is a maximum when the Dr is least
(ie) when β is least. But β > xi ∀ i = 1,2....n
Hence the least β is max {x1 .....x n } which is the MLE of β
142
Estimation of Sample proportion
We have just in the above seen if p = population proportion (i.e proportion of persons,
things etc. having a characteristics) then the ML estimate of p = sample proportion Now
we would like to find a (1 − α ) 100% confidence interval for p.
(This is treated in chapter 9 of your text book)
Large Samples
Suppose we have a ‘dichotomous’ universe; that is a population whose members are

either “haves” on “have – nots”; that is a member has a property or not.
For example we can think of a population of all bulbs produced by a factory. Any bulb is
either a “have” (ie defective) or is a “have-not” (ie it is good) and p = proportion of haves
= “Prob that a randomly chosen member is a “have”.
As another example, we can think of a population of all females in USA. A member is a

“have” ( = 0) is a blond or is a “have-not
“ (=is not a blond). As a last example, consider the population of all voters in India. A
member is a “have” if he follows BJP and is a “have-not” otherwise.
To estimate p, we choose n members at random and count the number X of “haves”. Thus
X is a rv having binomial distribution with parameters n and p!
n
P(X = x ) = f (x; p ) = p x (1 − p )
n −x
; x = 0,1,2.....n
x
and if n is large, we know “standardized Binomial standard normal”
X − np
(ie) for large n , has approx standard normal distribution. So we can say with
np(1 − p )
prob (1 − α ) that
143
x − np
− zα < < zα
2 np (1 − p ) 2
x
−p
or − z α < n zα
2 p (1 − p ) 2
n
x p (1 − p ) x p (1 − p )
or − zα < p < + zα
n 2
n n 2
n
X
In the end points, we replace ‘p’ by the MLE (=sample proportion)
n
Thus we can say with prob (1 − α ) that
x x x x
1− 1−
x n n x n n
− zα < p< + zα
n 2
n n 2
n
Hence a (1 − α )100% confidence interval for p is

x x x x
1− 1−
x n n X n n
− zα , + zα
n 2
n n 2
n
X
Remark : We can say with prob (1 − α ) that the max error − p in approximating p by
n
X
is
n
p (1 − p )
E = Zα
2
n
X
We can replace p by and say the
n
144
X X
1−
n n
Max error = Z α
2
n
1 1
Or we note that p(1 − p ) for (0 ≤ p ≤ 1) is a maximum (which is obtained when p = )
4 2
Thus we can also say with prob (1 − α ) that the max error.
1
= Zα
2
4n
This last equation tell us that to assert with prob (1 − α ) that the max error is E, n must be
2
Zα
1 2
4 E
Example 13
In a random sample of 400 industrial accidents, it was found that 231 were due at least
partially to unsafe working conditions. Construct a 99% confidence interval for the
corresponding true proportion p.
Solution
Here n = 400 , x = 231, (1 − α ) = 0.99

α
so that = 0.005 hence Z α = 2.575
2 2
Thus a 99% confidence interval for p will be
x x x x
1− 1−
x n n x n n
− Zα + Zα
n 2
n n 2
n
145
231 231 231 231
1− 1−
231 400 400 231 400 400
= − 2.575 + 2.575
400 400 400 400
= (0.5139,0.6411)
Example 14
In a sample survey of the ‘safety explosives’ used in certain mining operations,

explosives containing potassium mitrate were found to be used in 95 out of 250 cases. If
95
= 0.38 is used as an estimate of the corresponding true proportion, what can we say
250
with 95% confidence about the maximum error?
Solution
Here n = 250, X = 95, 1 − α = 0.95
α
so that = 0.025 ; hence Z α = 1.96
2 2
x x
1−
n n
Hence we can say with 95% confidence that the max. error is E = Z α
2
n
0.38 × 0.62
= 1.96 ×
250
= 0.0602
Example 15:
Among 100 fish caught in a large lake, 18 were inedible due to the pollution of the
.18
environment. If we use = 0.18 as an estimate of the corresponding true proportion,
100
with what confidence can we assert that the error of this estimate is atmost 0.065?
146
Solution
Here n = 100, X = 18 max error = E = 0.065
X X
1−
n n .18 × .82
We note E = Z α = Zα
2
n 2
100
= Z α × 0.03842
2
0.065
∴ Zα = = 1.69
2
0.03842
α
Hence = 1 − 0.9545 = 0.0455
2
∴α = 0.0910 or 1 − α = 0.9190
So we can assert with (1 − α ) × 100% = 91.9% confidence that the error is at most 0.065.
Example 16
What is the size of the smallest sample required to estimate an unknown proportion to
within a max. error of 0.06 with at least 95% confidence?
Solution
α
Here E = 0.06 ;1 − α = 0.95 or = 0.025
2
∴ Z α = Z 0.025 = 1.96
2
Hence the smallest sample size n is
147
2
Zα 2
1 1 1.96
n= 2
=
4 E 4 0.06
= 266.77
Since n must be an integer, we take the size to be 267.
Remark
Read the relevant material in your text on pages 279-281 of finding the confidence
interval for the proportion in case of small samples.
Tests of Statistical Hypothesis
In many problems, instead of estimating the parameter, we must decide whether a

statement concerning a parameter is true of false. For instance one may like to test the
truth of the statement: The mean life length of a bulb is 500 hours.
In fact we may even have to decide whether the mean life is 500 hours or more (!)
In such situations, we have a statement whose truth or falsity we want to test. We then
say we want to test the null hypothesis H0 = the mean life lengths is 500 hours (Here
onwards, when we say we want to test a statement, it shall mean we want to test whether
the statement is true). We then have another (usually called alternative) hypothesis. Make
some ‘experiment’ and on the basis of that we will ‘decide’ whether to accept the null
hypothesis or reject it. (When we reject the null hypothesis we automatically accept the
alternative hypothesis).
Example
Suppose we wish to test the null hypothesis H0 = The mean life length of a bulb is 500
hours against the alternative H1 = The mean life length is > 500 hours. Suppose we take a
random sample of 50 bulbs and found that the sample mean is 520 hours. Should we
accept H0 or reject H0 ? We have to note that even though the population mean is 500
hours the sample mean could be more or less. Similarly even though the population mean
is > 500 hours, say 550 hours, even then the sample mean could be less than 550 hours.
Thus whatever decision we may make, there is a possibility of making an error. That is
148
falsely rejecting H0 (when it should have been accepted) and falsely accepting H0 (when
it should have been rejected). We put this in a tabular form as follows:
Accept H0 Reject H0
H0 is true Correct Decision Type I error
H0 is false Type II Error Correct Decision
Thus the type I error is the error of falsely rejecting H0 and the type II error is the error of
falsely accepting H0. A good decision ( ≡ test) is one where the prob of making the errors
is small.
Notation
The prob of committing a type I error is denoted by α . It is also referred to as the size of
the test or the level of significance of the test. The prob of committing Type II error is
denoted by β .
Example 1
Suppose we want to test the null hypothesis µ = 80 against the alternative hyp µ = 83 on
the basis of a random sample of size n = 100 (assume that the population s.d. σ = 8.4 )
The null hyp. is rejected if the sample mean x > 82 ; otherwise is is accepted. What is the
prob of typeI error; the prob of type II error?
Solution
X−µ
We know that when µ = 80 (and σ = 8.4 ) the r.v. has a standard normal
σ
n
distribution. Thus,
P (Type I error)
=P (Rejecting the null hyp when it is true)
149
(
= P X > 82 given µ = 80 )
X − µ 82 − 80
=P >
σ 8 .4
n 10
= P(Z > 2.38)
= 1 − P(Z ≤ 2.38) = 1 − 0.9913 = .0087
Thus in roughly about 1% of the cases we will be (falsely) rejecting H0. Recall this is also
called the size of the test or level of significance of the test.
P (Type II error) = P (Falsely accepting H0)
= P (Accepting H0 when it is false)

(
= P X ≤ 82 given µ = 83 )
X − µ 82 − 83
=P ≤
σ 8 .4
n 10
= P(Z ≤ 1.19 )
= 1 − P( Z ≤ 1.19) = 1 − 0.8830 = 0.1170
Thus roughly in 12% of the cases we will be falsely accepting H0.
Definition (Critical Region)
In the previous example we rejected the null hypothesis when x > 82 (i.e.) when x lies in
the ‘region’ x>82 (of the x axis). This portion of the horizontal axis is then called the
critical region and denoted by C. Thus the critical region for the above situation is
{ }
C = x > 82 and remember we reject H0 when the (test) statistic X lies in the critical
150
region (ie takes a value > 82). So the size of the critical region ( ≡ prob that X lies in C)
is the size of the test or level or significance.
The shaded portion is the critical region. The portion ... is the region of false
acceptance of H0.
Critical regions for Hypothesis Concerning the means
Let X be a rv having a normal distribution with (unknown) mean µ and (known) s.d. σ .
Suppose we wish to test the null hypothesis µ = µ 0 .
The following tables given the critical regions (criteria for rejecting H0) for various
alternative hypotheses.
Null hypothesis : µ = µ 0 (Normal population σ known)

x − µ0
Z=
σ
n
Alternative Hypothesis
Reject H0 if Prob of Type I error Prob of type II error
H1
µ 0 − µ1
µ = µ1 (< µ 0 ) Z < −Zα α 1− F − Zα
σ
n
µ < µ0 Z < −Zα α
µ 0 − µ1
µ = µ1 > µ 0 Z > Zα α F + Zα
σ
n
µ > µ0 Z > Zα α
Z < −Z α
µ ≠ µ0 2
α
or Z > Z α
2
151
F(x) = cd f of standard normal distribution.
Remark:
The prob of Type II error is blank in case H1 (the alternative hypothesis) is one of the
following three things = µ < µ 0 , µ > µ 0 , µ ≠ µ 0 . This is because the Type II error can
happen in various ways and so we cannot determine the prob of its occurrence.
Example 2:
According to norms established for a mechanical aptitude test, persons who are 18 years
old should average 73.2 with a standard deviation of 8.6. If 45 randomly selected persons
averaged 76.7 test the null hypothesis µ = 73.2 against the alternative µ > 73.2 at the
0.01 level of significance.
Solution
Step I Null hypothesis H 0 : µ = 73.2

Alternative hypothesis H 1 : µ > 73.2
(Thus here µ 0 = 73.2 )
Step II The level of significance
= α = 0.01
Step III Reject the null hypothesis if Z > Z α = Z 0.01 = 2.33
Step IV Calculations
x − µ0 76.7 − 73.2
Z= = = 2.73
σ 8 .6
n 45
Step V Decision net para since Z = 2.73 > Z α = 2.33

we reject H0 (at 0.01 level of significance)
(i.e) we would say µ > 73.2 (and the prob of falsely saying this is ≤ 0.01 ).
152
Example 3
It is desired to test the null hypothesis µ = 100 against the alternative hypothesis
µ < 100 on the basis of a random sample of size n = 40 from a population with σ = 12.
For what values of x must the null hypothesis be rejected if the prob of Type I error is to
be α = 0.01?
Solution
Z α = Z 0.01 = 2.33 . Hence from the table we reject H0 if Z < − Z α =-2.33 where
x − µ0 x − 100
Z= = < −2.33 gives
σ 12
n 40
12
x < 100 − 2.33 × = 95.58
40
Example 4
To test a paint manufacturer’s claim that the average drying time of his new “fast-drying”
paint is 20 minutes, a ‘random sample’ of 36 boards is painted with his new paint and his
claim is rejected if the mean drying time x is > 20.50 minutes. Find
(a) The prob of type I error

(b) The prob of type II error when µ = 21 minutes.
(Assume that σ = 2.4 minutes)
Solution
Here null hypothesis H 0 : µ = 20

Alt hypothesis H 1 : µ > 20
P (Type I error) = P (Rejecting H0 when it is true)
Now when H0 is true, µ = 20 and hence
153
X − µ X − 20
σ
=
2 .4
=
6
2 .4
( )
X − 20 is standard normal.
n 36
Thus P (Type I error)
= P ( X > 20.50 given that µ = 20 )
X − µ 20.50 − 20
=P >
σ 2 .4
n 36
= P(Z > 1.25) = 1 − P(Z ≤ 1.25) = 1 − F(1.25)
= 1 − 0.8944 = 0.1056
(b) P (Type II error when µ = 21 )
=P (Accepting H0 when µ = 21 )
(
= P X ≤ 20.50 when µ = 21 )
X − µ 20.50 − 21
=P ≤ = P(Z ≤ −1.25) = P(Z > 1.25)
σ 2 .4
n 36
= 0.1056
154
Example 5
It is desired to test the null hypothesis µ = 100 pounds against the alternative hypothesis
µ < 100 pounds on the basis of a random sample of size n=40 from a population with
σ = 12. For what values of x must the null hypothesis be rejected if the prob of type I
error is to be α = 0.01?
Solutions
We want to test the null hypothesis H 0 : µ = 100 against the alt hypothesis H 1 : µ < 100
given σ = 12, n = 50.
Suppose we reject H0 when x < C.
Thus P (Type I error)

= P (Rejecting H0 when it is true)
(
= P X < C given µ = 100 )
X − µ C − 100 C − 100
=P < =P Z<
σ 12 12
n 50 50
C − 100
=F = 0.01
12
50
C − 100
implies = −2.33
12
50
12
Or C = 100 − × 2.33 = 96.05
50
Thus reject H0 if X < 96.05
155
Example 6
Suppose that for a given population with σ = 8.4 in 2 , we want to test the null hypothesis
µ = 80.0 in 2 against the alternative hypothesis µ < 80.0 in 2 on the basis of a random
sample of size n = 100.
(a) If the null hypothesis is rejected for x < 78.0 in 2 and otherwise it is accepted,
what is the probability of type I error?
(b) What is the answer to part (a) if the null hypothesis is µ ≥ 80 in 2 instead of
µ = 80.0 in 2
Solution
(a) null hypothesis H 0 : µ = 80

Alt hypothesis H 1 : µ < 80
Given σ = 8.4, n = 100
P (Type I error) = P (Rejecting H0 when it is true)
(
= P X < 78.0 given µ = 80 )
X − µ 78.0 − 80.0 10
=P < = P Z < 1−
σ 8 .4 4 .2
n 100
10
= 1− P Z < = 1 − F (2.38)
4 .2
=1-0.9913 =.0087
(b) In this case we define the type I error as the max prob of rejecting H0 when it is
(
true = P x < 78.0 given µ is a number ≥ 80.0 )
(
Now P x < 78.0 when the population mean is µ )
156
x−µ 78.0 − µ 10
=P < =P Z< (78 − µ )
σ 8 .4 8 .4
n 100
= F (1.19(78 − µ ))
We note that cdf of Z, viz F(z) is an increasing function of Z. Thus when

µ ≥ 80, F (1.19(78 − µ )) is largest when µ is smallest i.e. µ = 80. Hence P (Type I
error)
= Max F (1.19(78 − µ )) = F (1.19 × (78 − 80 ))

µ ≥ 80
= 0.0087
Example 7
If the null hypothesis µ = µ 0 is to be tested against the one-sided alternative hypothesis

µ < µ 0 (or µ > µ 0 ) and if the prob of Type I error is to be α and the prob of Type II
error is to be β when µ = µ1 , it can be shown that this is possible when the required
sample size is
σ 2 (Z α + Z β )
2
n=
(µ1 − µ 0 )2
where σ 2 is the population variance.
(a) It is desired to test the null hypothesis µ = 40 against the alternative hypothesis
µ < 40 on the basis of a large random sample from a population with σ = 4.
If the prob of type I error is to be 0.05 and the prob of Type II error is to be 0.12
for µ = 38, find the required size of the sample.
(b) Suppose we want to test the null hypothesis µ = 64 against the alternative
hypothesis µ < 64 for a population with standard deviation σ = 7.2. How large a
157
sample must we take if α is to be 0.05 and β is to be 0.01 for µ = 61? Also for
what values of x will the null hypothesis have to be rejected?
Solution
(a) Hence α = 0.05 , β = 0.12 µ 0 = 40, µ1 = 38, σ = 4
Z α = Z 0.05 = 1.645, Z β = Z 0.12 = 1.175

Thus the required sample size
16(1.645 + 1.175)
2
= = 31.89 ∴n ≥ 32.
(38 − 40)2
(b) Here α = 0.05, β = 0.01, µ 0 = 64, µ1 = 61, σ = 7.2
∴n ≥
(7.2 ) (1.645 + 2.33)
2 2
= 91.01 ∴ n ≥ 92
(61 − 64 )2
X − 64
We reject H 0 if Z < − Z α ie < −1.645 or X < 62.76
7 .2
92
Tests concerning mean when the sample is small
If X is the sample mean and S the sample s.d. of a (small) random sample of size n from
X − µ0
a normal population (with mean µ 0 ) we know that the statistic t = has a t-
S
n
distribution with (n-1) degrees of freedom. Thus to test the null hypothesis H 0 : µ = µ 0
against the alternative hypothesis H 1 : µ > µ 0 , we note that when H 0 is true, (ie) when
µ = µ 0 , P(t > t n −1,α ) = α
S
Thus if we reject the null hypothesis when t > t n −1,α (ie) when X > µ 0 + t n −1,α we
n
shall be committing a type I error with prob α .
158
The corresponding tests when the alternative hypothesis is µ < µ 0 (& µ ≠ µ 0 ) are
described below.
Note: If n is large, we can approximate t n −1,α by Z α in these tests.
Critical Regions for Testing H 0 : µ = µ 0 (Normal population, σ unknown )
Alt Hypothesis Reject Null hypothesis if

µ < µ0 t < −t n −1,α
µ > µ0 t > t n −1,α
t < −t n −1,α or
µ ≠ µ0 2
t > t n −1,α
2
X − µ0
t= (n → sample size)
s
n
In each case P(Type I error) = α
Example 8
A random sample of six steel beams has a mean compressive strength of 58,392 psi
(pounds per square inch) with a s.d. of 648 psi. Use this information and the level of
significance α = 0.05 to test whether the true average compressive strength of the steel
from which this sample came is 58,000 psi. Assume normality.
Solution
1. Null Hypothesis µ = µ 0 = 58,000

Alt hypothesis µ > 58,000 (why!)
2. Level of significance α = 0.05
3. Criterion : Reject the null hypothesis if t > t n −1,α = t 5, 0.05 = 2.015
4. Calculations
159
X − µ 0 58,392 − 58,000
t= =
S 648
n .6
= 1.48
5. Decision
= 1.48 ≤ 2.015
Since t observed
we cannot reject the null hypothesis. That is we can say the true average compressive
strength is 58,000 psi.
Example 9
Test runs with six models of an experimental engine showed that they operated for
24,28,21,23,32 and 22 minutes with a gallon of a certain kind of fuel. If the prob of type I
error is to be at most 0.01, is this evidence against a hypothesis that on the average this
kind of engine will operate for at least 29 minutes per gallon with this kind of fuel?
Assume normality.
Solution
1. Null hypothesis H 0 : µ ≥ µ 0 = 29
Alt hypothesis: H 1 : µ < µ 0
2. Level of significance ≤ α = 0.01
3. Criterion : Reject the null hypothesis if t < − t n −1,α = − t 5, 0.01 = −3.365 (Note n = 6 )
X − µ0
where t =
S
n
4. Calculations
24 + 28 + 21 + 23 + 32 + 22
X= = 25
6
160
S2 =
1
6 −1
[
(24 − 25)2 + (28 − 25)2 + (21 − 25)2 + (23 − 25)2 + (32 − 25)2 + (22 − 25)2 ]
= 17.6
25 − 29
∴t = = −2.34
17.6
6
5. Decision
Since t obs = −2.34 ≥ − 3.365 , we cannot reject the null hypothesis. That is we can
say that this kind of engine will operate for at least 29 minute per gallon with this
kind of fuel.
Example 10
A random sample from a company’s very extensive files shows that orders for a certain
piece of machinery were filled, respectively in 10,12,19,14,15,18,11 and 13 days. Use the
level of significance α = 0.01 to test the claim that on the average such orders are filled
in 10.5 days. Choose the alternative hypothesis so that rejection of the null hypothesis.
µ = 10.5 indicates that it takes longer than indicated. Assume normality.
Solution
1. Null hypothesis H 0 : µ ≥ µ 0 = 10.5

Alt hypothesis : H 1 : µ < 10.5
3. Criterion : Reject the null hypothesis if t < −t n −1,α = −t 8−1, 001 = t 7, 0.01 = 2.998
X − µ0
where t = (where µ 0 = 10.5, n = 8)
S
n
4. Calculations
10 + 12 + 19 + 14 + 15 + 18 + 11 + 13
X= =14
8
161
1 (10 − 14 ) + (12 − 14) + (19 − 14 ) + (14 − 14) + (15 − 14)
2 2 2 2 2
S2 =
8 − 1 + (18 − 14)2 + (11 − 14 )2 + (13 − 14 )2
= 10.29
14 − 10.5
∴t = = 3.09
10.29
8
5. Decision
Since t observed = 3.09 > 2.998 , we have to reject the null hypothesis .That is we can
say on the average, such orders are filled in more than 10.5 days.
Example 11
Tests performed with a random sample of 40 diesel engines produced by a large

manufacturer show that they have a mean thermal efficiency of 31.4% with a sd of 1.6%.
At the 0.01 level of significance, test the null hypothesis µ = 32.3% against the
alternative hypothesis µ ≠ 32.3%
Solution
1. Null hypothesis µ = µ 0 = 32.3

Alt hypothesis µ ≠ 32.3
3. Criterion : Reject H 0 if < −t n −1,α or t n −1,α (ie) if t < −t 39, 0.005 or t 39, 0.005 .
2 2
Now t 39, 0.005 ≈ Z 0.005 = 2.575

Thus we reject H 0 if t < −2.575 or t > 2.575
X − µ0
where t =
S
n
4. Calculations
31.4 − 32.3
t= = −3.558
1 .6
40
5. Decision
Since t observed = −3.558 < −2.575
Reject H0 ; That is we can say the mean thermal efficiency ≠ 32.3
162
Example 12
In 64 randomly selected hours of production, the mean and the s.d. of the number of
acceptable pieces produced by an automatic stamping machine are
X = 1,038 and S = 146. At the 0.05 level of significance, does this enable us to reject the
null hypothesis µ = 1000 against the alt hypothesis µ > 1000 ?
Solution
1. The null hypothesis H 0 : µ = µ 0 = 1000

Alt hypothesis H 1 : µ > 1000
3. Criterion : Reject H 0 if t > t n −1,α = t 64−1, 0.05
Now t 63, 0.05 ≈ Z 0.05 = 1.645
Thus we reject H 0 if t > 1.645
X − µ 0 1,038 − 1,000
4. Calculations: t = = = 2.082
S 146
n 64
5. Decision : Since t obs = 2.082 > 1.645
we reject H 0 at 0.05 level of significance.
163
REGRESSION AND CORRELATION
Regression
A major objective of many statistical investigations is to establish relationships that make

it possible to predict one or more independent variables in terms of others. Thus studies
are made to predict the potential sales of a new product in terms of he money spent on
advertising, the patient’s weight in terms of the number of weeks he/she has been on a
diet, the marks obtained by a student in terms of the number of classes he attended, etc.
Although it is desirable to predict the quantity exactly in terms of the others, this is
seldom possible and in most cases, we have to be satisfied with predicting average or
expected values. Thus we would like to predict the average sales in terms of the money
spent on advertising, the average income of a college student in terms of the number of
years he/she has been out of the college.
Thus given two random variables, X, Y and given that X takes th value x, the basic
problem of bivariate regression is to determine the conditional expected value E(Y|x) as a
function of x. In most cases, we may find that E(Y|x) is a linear function of x:
E(Y|x) = α + βx, where the constants α , β are called the regression coefficients.
Denoting E(X) = µ1, E(Y) = µ2, Var (X ) = σ1, Var (Y ) = σ2, cov(X,Y) = σ12, ρ =
σ 12
, we can show:
σ 1σ 2
Theorem: (a) If the regression of Y on X is linear, then

σ2
E(Y|x) = µ2 + ρ (x -µ1)
σ1
(b) If the regression of X on Y is linear, then
σ1
E(X|y) = µ1 + ρ (y -µ2)
σ2
Note: ρ is called the correlation coefficient between X and Y.
In actual situations, we have to “estimate” the regression coefficients α , β from a random

sample { (x1,y1), (x2, y2), … (xn, yn)} of size n from the 2-dimensional random variable
(X, Y). We now “fit” a straight line y = a + bx for the above data by the method of ”least
164
squares”. The method of least squares says that choose constants a and b for which the
sum of the squares of the “vertical deviations” of the sample points (xi, yi) from the line y
n
= a+bx is a minimum. I.e. find a, b so that T = [ y i − (a + bxi )] 2 is a minimum. Using
i =1
∂T ∂T
2-variable calculus, we should determine a, b so that = 0 and = 0. Thus we get
∂a ∂b
n n
the following two equations (−2) [yi – (a + bxi)] = 0 and ( -2xi) [yi – (a + bxi)] = 0.
i =1 i =1
Simplifying, we get the so called “normal equations”:
n n
na + ( xi )b = yi
i =1 i =1
n n n
( xi )a + ( xi2 )b = ( xi y i )
i =1 i =1 i =1
n n n n n
n( xi y i ) − ( xi ) ( yi ) ( yi ) − ( xi ) b
Solving we get b= i =1
n
i =1
n
i =1
; a= i =1 i =1
.
n
n( x )−(
2
i xi ) 2
i =1 i =1
These constants a and b are used to estimate the unknown regression coefficients α , β.
Now if x = xg, we predict y as yg = a + bxg.
Problem 1.
Various doses of a poisonous substance were given to groups of 25 mice and the
following results were observed:
Dose (mg) Number of deaths

x y
4 1
6 3
8 6
10 8
12 14
14 16
16 20
165
(a) Find the equation of the least squares line fit to these data
(b) Estimate the number of deaths in a group of 25 mice who receive a 7 mg dose of
this poison.
Solution:
(a) n = number of sample pairs (xi, yi) = 7
xi = 70, yi = 68 xi2 = 812, xi yi = 862
Hence b = {7 x 862 – 70 x 68 } / { 7 x 812 – (70)2 } = 1274/784 = 1.625

a = {68 – 70 x 1.625}/7 = - 6.536
Thus the least square line that fits the given data is: y = -6.536 + 1.625 x
(b) If x = 7, y = -6.536 + 1.625 x 7 = 4.839.
Problem 2:
The following are the scores that 12 students obtained in the midterm and final
examinations in a course in Statistics:
Mid Term Examination Final Examination

x y
71 83
49 62
80 76
73 77
93 89
85 74
58 48
82 78
64 76
32 51
87 73
80 89
166
(a) Fit a straight line to the above data
(b) Hence predict the final exam score of a student who received a score of 84 in the
midterm examination.
Solution:
(a) n = number of sample pairs (xi, yi) = 12
xi = 854, yi = 876 xi2 = 64222, xi yi = 64346
Hence b = {12 x 64346 – 854 x 876 } / { 12 x 64222 – (854)2 } = 24048/41348 = 0.5816

a = {876 – 854 x 0.5816}/12 = 31.609
Thus the least square line that fits the given data is: y = 31.609 + 0.5816 x
(b) If x = 84, y = 31.609 + 0.5816 x 84 = 80.46
Correlation
If X, Y are two random variables, the correlation coefficient, ρ, between X and Y is

defined as
cov ( X , Y )
ρ= .
Var ( X ) Var (Y )
It can be shown that
(a) -1 ≤ ρ ≤ 1
(b) If Y is a linear function of X, ρ = ± 1
(c) If X and Y are independent, then ρ = 0
(d) If X, Y have bivariate normal distribution and if ρ = 0, then X and Y are
independent.
Sample Correlation Coefficient
If { (x1,y1), (x2, y2), … (xn, yn)} is a random sample of size n from the 2-dimensional
random variable (X, Y), then the sample correlation coefficient, r, is defined by
167
n
( xi − x ) ( y i − y )
r= i =1
.
n n
( xi − x ) 2
( yi − y ) 2
i =1 i =1
We shall use r to estimate the (unknown) population correlation coefficient ρ. If (X, Y)

has a bivariate normal distribution, we can show that the random variable,
1 1+ r 1 1+ ρ 1
Z = ln is approximately normal with mean ln and variance .
2 1− r 2 1− ρ n −3
S xy
Note: A computational formula for r is given by r = ,
S xx S yy
n n
n n
( xi ) 2 n n
( yi ) 2
where S xx = ( xi − x ) 2 = xi2 − i =1
, S xx = ( yi − y ) 2 = y i2 − i =1
,
i =1 i =1 n i =1 i =1 n
n n
n n
( xi ) ( yi )
S xy = ( xi − x ) ( y i − y ) = xi y i − i =1 i =1
.
i =1 i =1 n
Problem 3.
Calculate r for the data { (8, 3), (1, 4), (5, 0), (4, 2), (7, 1) }.
Solution
x = 25/5 = 5. y = 10/5 = 2.
n
( xi − x ) ( y i − y ) = 3 x 1 + (-4) x 2 + 0 x (-2) + (-1) x 0 + 2 x (-1) = -7
i =1
n
( xi − x ) 2 = 9 + 16 + 0 + 1 + 4 = 30
i =1
n
( y i − y ) 2 = 1 + 4 + 4 + 0 + 1 = 10
i =1
−7
Hence r = = - 0.404.
(30) (10)
168
Problem 4.
The following are the measurements of the air velocity and evaporation coefficient of
burning fuel droplets in an impulse engine:
Air velocity Evaporation Coefficient

x y
20 0.18
60 0.37
100 0.35
140 0.78
180 0.56
220 0.75
260 1.18
300 1.30
340 1.17
380 1.65
Find the sample correlation coefficient, r.
n n
( xi ) 2
Solution. S xx = ( xi − x ) 2 = xi2 − i =1
= 532000 – (2000)2 /10 = 132000
i =1 i =1 n
n n
( yi ) 2
S xx = ( yi − y ) 2 = y i2 − i =1
= 9.1097 – (8.35)2 /10 = 2.13745
i =1 i =1 n
n n
( xi ) ( yi )
n n
(2000) (8.35)
S xy = ( xi − x ) ( y i − y ) = xi y i − i =1 i =1
= 2175.4 –
i =1 i =1 n 10
= 505.4
S xy 505.4
Hence r = = = 0.9515.
S xx S yy (132000) (2.13745)
**************
169
A Review of
STATISTICS
and
PROBABILITY
M. GANESH
(Professor, Mathematics Group)
Distance Learning Programmes Division

Birla Institute of Technology & Science
2007
PREFACE
In these notes, you are going to study and learn Statistics and
Probability, presented at the basic level.
Probability and Statistics are like cousin sisters. They go hand in hand
and they enable and synergize each other. We will see more of this as we
proceed into the actual study of these topics.
Probability tells you the ‘chances’ or ‘likelihood’ of the occurrence of

an event. (Note that, once an event has occurred, then probability has no role
to play). So, in a way, probability concerns with the ‘prediction of the
occurrence of an event’.
Statistics deals with the methods of extracting meaningful

information(s) from raw data. These methods are quite useful in the day to
day dealings of any organization or scientific study.
In these notes, written especially for YOU, you will study and learn
the basics or the fundamentals of Statistics and Probability. These ideas will
be illustrated with appropriately chosen examples.
As in the case of Calculus notes, I will take you through a short and
smooth journey of Statistics and Probability, which will enable you to learn
the basic ideas painlessly and with clarity, so that you can embark on a
happy journey of your course through the prescribed Text Book.
So here are my BEST WISHES for a happy journey into the realm of
Statistics and Probability.
M. Ganesh
Table of Contents
1. Prologue
2. Chapter 1; Basic Statistics
3. Chapter 2: Standard Distributions
4. Chapter 3: Population Vs Sampling
5. Chapter 4: Estimation
6. Chapter 5: Correlation and Regression
7. Epilogue
PROLOGUE
1. Statistics, as was pointed out, deals with raw data; that is,
with numbers. It gives you methods for arranging the raw data into
manageable collections, and how to extract required information
about the data set.
Initially, Statistics was looked down upon with lots of doubt

and suspicion. People even talked about “lies, big lies and
statistics” (that is, statistics was considered to be the biggest
fraud!); or, “you can prove anything with statistics”, meaning that
its methods are suspect.
There is even a humorous but derisive story about statistics.

This is the story of a statistician, who found the average depth of a
pond to be 4 feet, jumped into it and got drowned! The moral is
that: statistical average does not reflect the true nature of the whole
data set.
But, slowly, the methods became popular and people realized

the importance of statistical methods in real life applications. Then,
Statistics gained acceptance, respect and credibility. However, one
needs to be cautious when applying these methods; like a knife,
which can be used for killing a person as well as for carving a
piece of wood to create beautiful patterns!
2. Probability had its origin in gambling houses and casinos.

So, it was derided and ridiculed in its earlier developmental phase.
It has been noted that, a wealthy French businessman who was a
frequent participant and enthusiast of gambling, wanted to know
the chances of his winning! (Most probably, always). He
approached Laplace, a well-known French mathematician of his
times, regarding this. Laplace is considered to be the first one to
have used mathematical ideas to compute the ‘chances’ of
1
occurrence of an event (and as a corollary, the businessman’s
chances of winning). Now – a - days, we call it by the name
‘probability’.
You might have heard about ‘time series’, which is used to

predict the future value(s) of certain entity (like, profit of an
organization, GNP of a country, market share of a product,
financial growth of a company or a country, etc.) based on the
present trend. But, time series and probability are NOT the same,
even though both aim at prediction: one aims at predicting the
future value of an entity and the other at predicting (or assessing)
the chances of occurrence of an event.
While probability theory attracted the attention of many

people (from mathematical and nonmathematical community), the
development of its cousin sister, Statistics was comparatively
quiet. Its importance was realized later through the development of
Sampling Theory.
$$$ ### $$$
2
Chapter 1
Basic Statistics
1.1 : Introduction
Statistics (or Statistical methods) is a systematic way of dealing with

raw data or data set, to extract meaningful information. (What is
‘meaningful’ depends on the context). The study of Statistics started with the
methods for finding the mean and mode from a given data set. Later on,
concepts like median, variance and standard deviation (SD) were added and
methods for determining them were developed. Still later, when more
information was required by the business communities, other concepts like
quartiles, deciles, and percentiles and skewness and kurtosis were added to
the repertoire of Statistical methods. All these concepts put together give a
set of values, which is supposed to provide complete and essential
information about the given data set.
In this Chapter, we will study and learn certain of the above concepts
which are quite popular and often used.
1.2: Data Set
In Statistics, by a data set, we mean a set or collection of numerical

values as given below:
02, 10, 46, 62, 47, 20, 04, 08, 28, 35,
62, 44, 18, 04, 01, 09, 01, 04, 86, 91,
00, 129, 116, 03, 21, 02, 48, 106, 100, 09,
99, 05, 08, 08, 06, 25, 47, 60, 34, 00,
16, 35, 25, 29, 41, 07, 10, 02, 01, 05.
Such a set of values is also called raw data. (Each value is known as
datum and the collection of values data). This is because; they do not have
any ‘meaning’ or ‘significance’ on their own, other than their numeric
values. Their ‘meaning’ depends on the context to which they are applied.
We illustrate this by two examples.
1
Example (1): Suppose you are told that this set of values represents the
number of runs scored by a batsman in 50 consecutive innings. Then, the
data set acquires a meaning and we feel it is meaningful.
Example (2): Again, suppose you are told that this set represents the
quantity of rainfall (in mm) in a certain place for a period of 50 months (the
rainfall is noted down on the same date of 50 consecutive months). Then the
same data set acquires a meaning, even though a different one.
So, raw data, by themselves convey no ‘meaning’, and their

‘meaning’ varies from context to context.
1.3: The Mean of a Data Set
Given the data set of Section 1.2,, we compute the mean or average
−
(denoted by µ or x ) as follows: Add all the 50 values and divide by 50. This
gives
Sum = 1566
Mean = Sum / 50 = 31.32
Example (3): Let us go back to Example (1). In this interpretation the mean
value of 31.32 represents the ‘average number of runs scored by the batsman
per innings’.
Example (4): Let us go back to Example (2). In this interpretation the mean
value of 31.32 represents the ‘average quantity of rainfall (in mm) in that
place per day’.
The numeric value of the mean gives a good measure of excess or

deficiency of rainfall, from which we can get to know the information for
questions of the following type: ‘On how many days there was rainfall in
excess of the average?’ etc. We may answer this by counting; but an easier
way is through visual aids like a diagram shown below:
2
Data
Values
Mean
Value
Figure 1
We give below one more example.
Example (5): Consider the following data which represent the length (in cm)
of the index finger (right hand) of 30 men:
6.2, 6.5, 5.8, 7.2, 7.0, 5.9, 6.8, 6.4, 6.0, 6.1,
5.6, 5.8, 7.2, 7.5, 7.0, 6.9, 6.3, 6.1, 6.2, 6.0,
7.1, 7.1, 6.9, 6.5, 6.8, 6.6, 7.0, 7.9, 5.4, 6.3.
Let us determine the mean.
Sum = 197 and N = 30;
Therefore, mean = Sum / N = 6.57.
You should draw a diagram like the one shown above. Take the 30 numbers
1 to 30 along the X – axis and the data values along the Y – axis. Plot the
points and join them in sequence. Draw the horizontal line indicating the
mean. Now, answer the questions: What can you infer from this Figure?
Does it convey any meaningful information to you? Write down your
answers and compare with those of your friends.
3
1.4: The Mode of the Data Set
Consider the same data set given in the Section 1.2. Pick out the
maximum value / values. This maximum value represents the mode or
modal value of the data set. For this data set , te mode is 129. This value of
the mode conveys the information that the set of values in the data set cannot
go beyond the modal value. This is the peak value. A diagram like the one
given below conveys this simple idea. See Figure 2.
Data Modal Value

Values
Mean
Value
Figure 2
Example (6): Let us go back to Example (1). The modal value indicates the
highest number of runs scored by the batsman during the period. This he
scored in his 12th innings.
Example (7): Let us go back to Example (2). Here the modal value indicates
the maximum quantity of rainfall in that place during the period. The
maximum rainfall occurred on the 12th month.
Example (8): Let us go back to Example (5). The highest value is 7.9. So,
the modal value of this data set is 7.9.
4
In Examples (6), (7) and (8) the modal value occurs only once. Such a
data set or its plot is said to be unimodal. Figure 2 shows a unimodal plot.
Generally, the modal value may occur several times or may get
repeated several times. In such cases, we say that the data set or its plot is
multi – modal. Its graph or plot will look like as shown in Figure 3.
Modal Value
Figure 3
1.5: Median of the Data Set
The value (which may or may not belong to the data set) which
bifurcates (that is, divides the data set into two equal half) the data set is
called the median or median value. To be precise, arrange the values of the
data set in increasing order, and look at the middle value (or values). This
value (or values) gives the median value (or values) of the data set.
Example (9): For a given data set, we have the sorted values in ascending
order as:
03, 10, 16, 18, 20, 21, 23, 25, 25, 29,
34, 34, 34, 34, 35, 35, 41, 41, 44, 44,
46, 46, 47, 47, 48, 50, 50, 50, 52, 52,
56, 57, 57, 59, 62, 63, 65, 66, 66, 69,
70, 72, 75, 77, 78, 80, 82, 82, 83, 96.
5
Since there are 50 values in the data set, we look at the 25th and 26th values.
Since the 25th and 26th values are 48 and 50 respectively, the median is given
by the average of these two values; that is by [48 + 50] / 2 = 49. Note
that this value 49 of the median does not belong to the data set.
Now, let us plot the given data set as given by the increasing order
and draw the horizontal line at 49. (The student should do this). You will get
a plot like Figure 4. You will find that exactly 25 values (the first 25 values0
lie below this line and exactly 25 (the last 25 values) lie above this line.
Example (10): Let us go back to Example (5). Arranging the data in

increasing order, we get
5.4, 5.6, 5.8, 5.8, 5.9, 6.0, 6.0, 6.1, 6.1, 6.2, 6.2, 6.3,
6.3, 6.4, 6.5, 6.5, 6.6, 6.8, 6.8, 6.9, 6.9, 7.0, 7.0, 7.0,
7.1, 7.1, 7.2, 7.2, 7.5, 7.9.
Since there are 30 values in the data set, we look at the 15th and 16th values.
The 15th and 16th values are 6.5 and 6.5 respectively. They are same. Hence ,
the median value is given by the average of these two values; that is by [6.5
+ 6.5] / 2 = 6.5. Note that in this case, the median value belongs to the data
set. You should draw a figure for this Example.
Median
Figure 4
6
Let us digress briefly from our mainstream ideas to discuss another
important concept associated with any Statistical Analysis of a data set. It is
called the frequency table. This idea ia an important one and will reappear
in the later Chapters.
1.6: Frequency Table
Let us re – consider our data set of 50 values of Example (9).. We

have already arranged these values in increasing order. This arrangement
indicates the frequency of occurrence of each datum in the set. This is
recorded as follows. See Table 1.
Table 1
Datum 00 01 02 03 04 05 06 07 08 09
Freq. 0 0 0 1 0 0 0 0 0 0
Datum 10 16 18 20 21 23 25 29 34 35
Freq. 1 1 1 1 1 1 2 1 4 2
Datum 41 44 46 47 48 50 52 56 57 59
Freq. 2 2 2 2 1 3 2 1 2 1
Datum 62 63 65 66 69 70 72 75 77 78
Freq 1 1 1 2 1 1 1 1 1 1
Datum. 80 82 83 96
Freq 1 2 1 1
The total of these frequencies is 50, as it should be. This frequency table can
be further compressed and expressed in a compact manner as shown below.
See Table 2.
7
Table 2
Class Frequency
00 -- 09 01
10 -- 19 03
20 -- 29 06
30 -- 39 06
40 -- 49 09
50 -- 59 09
60 -- 69 06
70 -- 79 05
80 – 89 04
90 – 99 01
Total 50
1.7: Histogram
Closely associated with a frequency table is its visual counterpart

called histogram. Histogram of a frequency table, immediately displays the
necessary information in a visually understandable form and widely used in
business discussions and assessment of national growth in industrial and
financial sectors. They are also used for many other purposes and
applications.
Just as histogram is a visual counterpart of a frequency table, the
frequency table is a tabular counterpart of a histogram. Given one, we can
generate the other.
We now illustrate the concept of a histogram by means of an example:
Example (11): Let us go back to the Example of the previous Section. The
frequency table given there can be converted into a histogram as shown
below. Here we plot the class values (given in the first column of Table 2)
along the X – axis and plot the corresponding frequencies along the Y – axis.
The histogram is shown below. In the histogram given below, the middle
values have been taken as 10, 20, 30, 40, 50, 60, 70, 80, and 90.
8
Histogram for Example (11)
1.8: Stem Plots or Node Charts
This is another visual aid based on the frequency table, which helps in
taking decisions and assessing about profit / loss, above average / below
average, healthy / sick , etc.
1.9: Variance & Standard Deviation
Once we are given a data set and we have found the mean, then the
variance is determined based on the view point taken by us or the
information provided or available to us.
If we have no information, then we treat the data as a population and

compute the variance of the data by the formula
Var = [∑ (xk – m)2] / N
Where N = number of datum and m = the mean of the data set.
9
If we have information that the given data set represents a sample
from a population, then we compute the variance from the formula
Var = [∑ (xk – m)2] / [N – 1]
Where N and m are as above.
The standard deviation of the data set is given by the positive square
root of variance. In the first case, where the data is considered as population,
it is denoted by σ and in the second case it is denoted by s. that is, we have
σ = + Sqrt [[∑ (xk – m)2] / N]
and
s = + Sqrt [[∑ (xk – m)2] / [N – 1]]
Example: Let us go back to Example (5). We have the length (in cm) of the
index finger (right hand) of 30 men:
6.2, 6.5, 5.8, 7.2, 7.0, 5.9, 6.8, 6.4, 6.0, 6.1,
5.6, 5.8, 7.2, 7.5, 7.0, 6.9, 6.3, 6.1, 6.2, 6.0,
7.1, 7.1, 6.9, 6.5, 6.8, 6.6, 7.0, 7.9, 5.4, 6.3.
We have already found the mean m for this data. It is = 6.57. So we
compute the mean square deviations of each datum:
(0.37)2 + (0.07)2 + (0.77)2 + (0.63)2 + (0.43)2 + (0.67)2 + (0.23)2 + (0.17)2 +
(0.57)2 + (0.47)2 + (0.97)2 + (0.77)2 + (0.63)2 + (0.93)2 + (0.43)2 + (0.33)2 +
(0.27)2 + (0.47)2 + (0.37)2 + (0.57)2 + (0/53)2 + (0.53)2 + (0.33)2 + (0.07)2 +
(0.23)2 + (0.03)2 + (0.43)2 + (1.33)2 + (1.17)2 + (0.27)2 = 10.363
Now, if we consider this as our population, then

Var = 10.363 / 30 = 0.3454 and hence SD = Sqrt(0.3454) = 0.5877.
But, if we consider the data set as our sample from some population, then
Var = 10.363 / 29 = 0.3573 and hence SD = Sqrt(0.3573) = 0.5977.
This completes our example.
We close with two formulas for computing the variance and the
standard deviation from frequency table. So, we start with a frequency table,
whose colums are xk and fk. Here xk is the mid point of the class interval
10
and fk is the corresponding frequency. Let the mean be m. Then we have the
formulas:
(1) Var = [∑ fk (xk – m)2] / N
(2) Var = [∑ fk (xk – m)2] / [N – 1]
(3) SD corresponding to (1) is Sqrt(Var) , and
(4) SD corresponding to (2) is Sqrt(Var).
Exercises:
1. Consider the data set of Section 1.2. Compute the variance and the SD.
2. For the data set of Example 9, compute the variance and the SD.
3. For the frequency Table (2) of Section 1.6, compute the variance and the
SD.
11
Chapter 2
STANDARD DISTRIBUTIONS
2.1: Introduction
So far, we saw howa frequency distribution gave a convenient way of

presenting the given set of data. For example, we consider the frequency
distribution of the ages of 40 employees of an organization as given in Table
1.
Table – 1
Class Frequency
20-25 10
25-30 6
30-35 4
35-40 4
40-45 6
45-50 7
50-55 2
55-60 1
n = 40
Now, for each class, we can compute the relative frequency; for
example, for the class 20-25, the relative frequency is 10 40 = 1 4 . Similarly,
for the class 25-30, the relative frequency is 6 40 = 3 20 , and so on. We can
now tabulate this as follows. See Table – 2 given below.
1
Table – 2
Class Frequency Relative Frequency

20-25 10 1
4
25-30 6 3
20
30-35 4 1
10
35-40 4 1
10
40-45 6 3
20
45-50 7 7
40
50-55 2 1
20
55-60 1 1
40
Note that the sum of all relative frequencies is 1. Also, observe that,
if we take another set of 40 employees of the same company and form the
relative frequency distribution then, we will get completely different table.
Thus, Tables like Table – 2 depends on the set of data. But, still, the sum of
all relative frequencies will be 1.
This gives rise to the question: Are there distributions which do not
depend on the data? Can we define them independent of any data? If so,
how?
Such questions are very natural to ask, but not so very easy to answer;
though, the answers are available to all of them. The answers are yes and
that is precisely what we are going to study in this Chapter.
2
2.2: Discrete versus Continuous
The data arising from a real – life situation can be of either one of the
following two categories: discrete and continuous.
Generally, observation, which can be measured only in whole

numbers, are said to be of discrete type. For example, the number of people
residing in a town at an instant of time, the number of defective bulbs
produced in a factory in one week run, the number of runs made by a
cricketer in a year, the number of shafts manufactured by a company in a
week, etc.
There are also observations which can be measured only in a

continuous way. Such observations or the corresponding data are said to be
of continuous type. For example, the heights of all the students of a college,
the weights of all new – born babies in a maternity hospital, the lengths of
cloth produced by a textile – mill over a month etc.
Distributions associated with discrete type situation are called

discrete distribution and those associated with continuous – type situations
are called continuous distribution.
Under discrete distributions, we will be studying binomial and

Poisson distributions.
And under continuous distributions we will be studying normal and

exponential distributions.
2.3 Discrete Distributions
We have already seen that certain situations are of discrete – type.

Such situations can be studied by means of discrete distributions. For such a
study to become feasible, we need to know about and become familiar with
certain important and well – known distributions. In this Section, we will
study and learn two of the important and simple discrete distributions,
namely, the Binomial and the Poisson Distributions, which have many
applications in real life problems.
3
2.3.1: Binomial Distribution
Binomial distribution is applied to situations with exactly two out

comes: For example life - or - death, head – or - tail, success – or - failure
etc.
One of these two outcomes is called ‘success’ and the other will
automatically be called ‘failure’. This choice will depend on the situation
under study and the decision called for.
As an illustration, imagine that two players – A and B – are playing a

game: a coin is tossed. If head appears A wins. If tail appears B wins. Now,
for A, head is success and tail is a failure. Whereas, for B, tail is a success
and head is failure!
So, naming an outcome as success depends on the objective of the

study undertaken. If you are a supporter of the player A, then ‘head’ will be
rated as a ‘success’ by you.
In the above game, ‘a toss of the coin’ is usually called a trial.

Similarly, appearing in an exam is a trial. Observing a person is a trial as he
could be alive or dead!
A series of trials constitutes an experiment. For example the above

game may consist of 10 tosses of the coin! In which case, each toss is a trial
and all the 10 tosses constitute an experiment. Similarly, a pensioner has to
refer to the pension office every month to prove he is alive. In this case, the
observation made every mouth is a trial and all the observations, over a
period of time will constitute an experiment.
Binomial distributions are appropriate for situations which involve

experiments, such that (i) only two out comes are possible for each trial; (ii)
these two outcomes are same for each trial; (iii) they cannot occur together
(for example, a person could not be living as well as dead !); (iv) one of the
outcomes has to always occur; and (v) the chances of getting any single
outcome remains constant throughout the experiment.
4
For such situations, the chances of getting k successes out of n trials
can be calculated from the formula:
(1) P(x = k) = C(n, k) pk (1 – p) 1 – k
where p is the chance or probability of success in a single trial, and C(n, k)

are the binomial coefficients. P(x = k) stands for the probability of getting k
successes.
Example:1: The ADC company manufactures bulbs. It wants to regulate the
quality of bulbs produced. To keep a control on the quality, it requires
knowing the chances of having 20 defective bulbs in a pack of 100 bulbs.
So, let us help them by determining this probability.
Solution: One way of doing this is as follows: take this pack and test each
one of the 100 bulbs! If we come across exactly 20 defective bulbs, then the
probability is 1; otherwise, it is zero. Of course, this is a straightforward
method.
But if the company wants to know this probability for 1000 packs of
100 bulbs each, then this method would be cumbersome and problematic.
So, we need methods to deal with such situations. Let us look at this
situation as a situation with two outcomes – either a bulb is good or
defective (note that it cannot be both or neither!). All the above given 5
conditions for a Binomial distribution are satisfied. (Verify this). So, we are
in a binomial situation. Thus, we can compute the probability, if we can find
n, k and p. We have here n = 100 and k = 20. But what is p? It is not given.
So, assume for the moment that p = 0.31. Then the required probability is
P(x = 20), whose value is given by
100 !
(0.31) 20 (0.69) 80 .
20 !80!
This is a complicated expression and difficult to determine by computation.

Fortunately, tables are available. Form the table, we get
P(x = 20) = 0.0046.
5
Note that, this probability is same for every pack of 100 bulbs of this
company. This also means that the chance of finding exactly 20 defective
bulbs in every pack of 100 bulbs is only 0.0046, which is very small. So, the
ADC company can be happy that the quality control is up to their
expectations.
But, at the same time, the chances of finding 33 defective bulbs in

every pack of 100 bulbs turns out to be .0771, which is much more than that
for 20 defective bulbs ! This is no contradiction, but is a direct consequence
of our assumptions about binomial distributions.
-------------------------------------------------------------------------------------------
A DIGRESSION: (How to determine the value of the probability of success
in a given situation)
Now, going back to the above example, we see that we assumed a

value for p, that is, p = 0.31. Usually, in the text book problems, this value is
given or at least can be calculated easily from the given data. But, in real –
life situations, nobody gives you this value; or to put in another way, it is not
known. Then, how to determine it?
One method is the following: this is generally done through scanning

the past records of the company, or, by taking samples, as explained below:
Suppose that our company keeps records meticulously. This means that,
over a period, the company has been noting down the daily production of
bulbs and the number of defectives, say, per 100 bulbs. Then, we can take a
period of time (preferably 1 year) which is a recent one and find out the total
number of defective bulbs (say d) and the total number of bulbs produced
d
(say, N) for this period. Then p can be taken to be the ratio .
N
Of course, this estimate requires a lot of caution on the part of the
investigator – the period chosen should be sufficiently long, the data should
be realistic and so on; but this can be done, providing the past records are
accurate and available. Otherwise, one can take a sample of M bulbs
produced on a day and count the number d of defective ones. Then, the ratio
d / M can be taken as the value of p. This is one way of determining p.
6
Another method is to simply assume a value for p (note that the value
assumed should be between 0 and 1) which would be “realistic”. For
example, when USA sent a man into the space for the first time, the
probability that the rocket would function normally or the probability that he
would survive in the unknown conditions of the space or the probability that
he would safe land, were not known as there was no past records available
(naturally). Also you cannot send 100 people into the space to collect the
required data. Similarly, what will happen after a nuclear holocaust is
anybody’s guess. So, we cannot know the probability of survival of a human
being or a nation or a species or oxygen in the atmosphere. And it will be the
height of stupidity to conduct experiments regarding this. Under such
circumstances, either we assume a value for the probability which is
“realistic’ or we admit that probability theory is NOT applicable to such
situations.
It might appear to you that these are very extreme cases, which are
rare. If that is the case, consider the following examples from real life:
(i) A production company is thinking of launching a new product. In this
case, the probability that this new product will out beat its rivals or the
probability that it will garner 40% of the market, cannot be determined (that
is, not known a priori);
(ii) The government of a country is thinking of launching a new program for
agriculture. The probability of its success cannot be determined (that is, not
known a priori);
(iii) An educational institution is considering the introduction of a new
method for teaching. The probability of its success cannot be determined
(that is, not known a priori);
(iv) You are preparing to appear in an examination. The probability of your
success cannot be determined (that is, not known a priori). Even your past
records regarding the other examinations you have taken, are of no use here.
All these and many other examples are encountered quite often in real
life. So, even in such familiar situations, determining the probability is either
difficult or might not be possible.
This aspect should be kept firmly in the mind, when dealing with
applications of probability to real life situations.
End of Digression
--------------------------------------------------------------------------------------------
7
Example:2: In a certain experiment, 6 rabbits are given a drug. It is known
that one – fifth of all rabbits which are giver the drug develop certain
symptoms. Let us determine the chances that 4 out of these 6 rabbits develop
the symptoms.
Solution: This is a binomial situation with outcomes

Success = developing symptoms
Failure = not developing any symptoms.
It is given that one – fifth of all rabbits which are given the drug
develop symptoms. Thus
1
p = = 0 .2
5
Also, n = 6 and k = 4. Thus the required probability is
C(6, 4) (0.2)4 (0.8)2 .
From the tables, we get this value as 0.015. This means that, if we
repeat the above experiments, we will be able to observe 4 rabbits
developing the symptoms on 1.5% of times.
Example: 3: Go back to the above example 2. Let us now ask: what are the
chances that at most 2 rabbits develop symptoms.
Solution: Now, it may happen that (i) none of the rabbits develops
symptoms, that is, k = 0; or (ii) one of the rabbits develops symptoms, that
is, k = 1; or (iii) two of the rabbits develops symptoms, that is, k = 2. This is
the meaning of saying that “at most 2 rabbits develop symptoms”.
From the tables, for n = 6, k = 0, and p = 0.2;

we have p0 = P(x = 0) = 0 .2621;
For n = 6, k = 1, p = 0.2, p1 = P(x = 1) = 0 .3932;
For n = 6, k = 2, p = 0.2, p2 = P(x = 2) = 0.2458.
Thus the required probability is = p0+ p1 +p2
= 0.9111.
8
In other words, the chances are very high (more than 91%) that at most 2
rabbits develop symptoms.
2.3.1.1 The Mean and the Variance of a Binomial Distribution
Just as we have seen, for every data set we can compute the mean and
the variance, in a similar manner we can compute the mean and the variance
for every distribution. Note that a binomial distribution involves two
parameters n and p.
The mean of such distribution is denoted by µ and is given by the

formula
µ = n p.
Similarly, the variance is given by
σ2 = n p (1 – p)
Example:4: Go back to our first example. We had n = 100 and p = 0.31.

This gives us
µ = 31
σ2 = 21.39
EXERCISES
Note: Keep your Text Book by your side (or any book will do)
Caution: In certain books, tables are given only for cumulative
distributions. Use it with caution, as we are dealing with the probability
directly.
1. An urn (or a box) contains 4 red and 6 blue balls. A ball in drawn at
random, its color noted and replaced before next drawing. Drawing a
red ball is considered as success. What are the chances of getting a red
ball in n number of drawings?
9
2. Two teams play a five game series. The chances of the home team
winning a particular game is 0.55. What are the chances that the home
team wins at least 3 games?
3. Of every 1000 parts produced by a machine, 10 are defective, on the
average. What are the chances that some, but not all, of a sample of
three of these parts turn out to be defective.
4. A shopkeeper has been getting over Rs. 200 a day, on the average, for
eight days out of every ten days over the past several months. What
are his chances of getting the same turn-over at least five out of the
next six days?
2.3.2: The Poisson distribution
The Poisson distribution is another example of discrete type

distributions. The Poisson distribution plays a very important rote in its own
right as an appropriate probability - model for a large number of random
phenomena. The Poisson model is often used for random variables
distributed over time or space. Study the following statements carefully;
each one describes a real life situation.
1. The number of automobile deaths per month in a large city;

2. The number of telephone calls a person receives per day;
3. The number of defectives in an article produced by a manufacturing
company in a day;
4. The pulse rate of a critically ill patient admitted in a hospital;
5. The number of words a typist can type in 15 minutes;
6. The number of bacteria in a given culture;
7. The number of red blood cells in a specimen of blood;
8. The number of typographical errors per page of a book or a magazine;
9. The number of acres of land (in a locality) suitable for irrigation;
10.The length of defective road per 10 miles of transportable roads.
Note that, the examples (1) to (5) involve time period and examples
(5) to (8) involve space (length or area or volume).
Each of the above situations has the following characteristics in

common:
10
1. Events which occur in one time interval do not depend on the
happening or non - happening of those occurring in any other non
over-lapping time interval.
2. The probability that an event occurs is proportional to the length of
the time or space units.
3. The probability that two or more events occur in a very small time or
space unit is supposed to be small enough that it can be neglected.
You must be wondering what these observations and these factors

have got to do with Poisson distributions. First of all, these are the
characteristic features of any situation following a Poisson model.
Secondly, by using these factors, it is possible to derive the Poisson model.
The Poisson model or distribution gives the probability that exactly k

successes occur in a given time (or space) interval. This probability is
denoted by P(x = k) and is given by
e −m m k
(2) P (x = k) =
k!
where e is a real number whose approximate value is 2.71828182; m is the

expected number of occurrences in the given time interval and k = 0,1,2, ---.
Some times, it is obvious that the independence condition (i.e.

assumption (1) above) is not satisfied. For example, we might be tempted to
use the Poisson distribution to compute the probability distribution of the
number of insects found in a hill of corn. A little reflection reveals that, in
this case, events are not independent, since the present number of insects is
dependent on the previous population. That is, more the number of insects,
at any instant, more are produced. In spite of this, the Poisson model, some
times, gives fairly accurate probabilities, even though all the assumptions are
not satisfied.
2.3.2.1: The Mean and Variance of Poisson distribution
The mean (µ) and the variance (σ2) can be easily derived. They are
given by
Mean of the Poisson distribution = µ = m
Variance of the Poisson distribution = σ2 = m.
11
Note : (1) The sum of all probabilities is 1. That is, if P(x = k) is as in
equation (1) then Σ P(x = k) = 1.
(2) The probability of having n or less number of occurrences (or
successes) is given by P(x ≤ n) = Σ P(x = k) where the sum is taken from k
= 0 to k = n.
(3) The probability of having n or more number of occurrences is
given by P(x ≥ n) = 1 - Σ P(x = k) where the sum is taken from k = 0 to k =
n – 1.
Example: 5: If a person receives 5 calls on the average during a day, what is
the probability that he will receive fewer than 5 calls tomorrow?
Solution: According to the previous discussion, experience has shown that
Poisson probability model is appropriate for this situation. The average value
m = 5 is given. Thus we need to compute P(x ≤ 4). This is given by
4
P ( x ≤ 4 ) = ∑ P (x = k) = 0.44049
k =0
(The value can be found from the table in any statistics book);
The probability of receiving exactly five calls is
P ( x = 5) = p (5) = 0.17547
Example: 6: A secretary claims that she averages one error per page. A
sample page is selected at random from some of her work, and five errors
are found. What is the probability of her making five or more errors on a
page if her claim is correct?
Solution: Assuming that the Poisson process is appropriate, we take m = 1
per page. The required probability is given by P(x ≥ 5). Thus we have
P(x ≥ 5) = 1 – P(x ≤ 4) = 1 – 0.9963 = 0.0037.
In general, such problems do not end with the computation of the
required probability. It is also required to interpret the value according to the
given context. This is done as follows: Note that, the value of the probability
is very small, indicating that the secretary is an exceptional one (that is, she
makes very few errors!). But, if she is not very experienced person in this
area, then, in view of the small value of the probability, we may conclude
one of the following:
12
1. The Poisson model is correct and a near miracle has occurred;
2. The model is correct but the wrong average value m has been claimed;
3. The model is incorrect.
Probably (2) is more plausible, in this case.
EXERCISES
(1) A city has on the average, five traffic deaths per month. What is the
probability that this average is exceeded in any given month?
(2) A taxicab company has, on the average, 10 flat tyres per week. During
the past week they had 20. Assuming the Poisson model is
appropriate, what is the probability of having 20 or more flats daring a
week? Would you suspect foul play?
2.4 Continuous Distributions
We already know through section 7.2, that distributions can be of two

types and up till now, we were learning about discrete distributions. In this
Section, we will learn about continuous distributions, namely Normal and
Exponential distributions.
2.4.1 Normal Distributions
One of the most important and useful sets of continuous

distributions in statistics is the set which comes under the name of ‘Normal’
distributions. The graphs of typical normal distributions look like the one
given below. Refer Text Book..
The curve is determined entirely by the mean and standard

deviation. As a result, the graphs of normal distributions with the same
mean, but different standard deviation differ only in the amount of
dispersions.
Normal distributions with the same standard deviations, but

different means, look identical in shape and differ only in their placement on
X – axis.
13
Theoretically, although it may not be apparent from figures, the
curve never touches the X – axis. However, it approaches it so closely that
for practical purposes the area lying farther than ± 3σ from the mean µ can
be ignored without any loss.
One of the special features of normal distributions is that the

total area under the curve is one. A normal distribution with µ = 0, σ = 1 is
called the standard normal distribution.
A general Normal distribution with mean µ and standard deviation σ is

given by the formula
(3) P(x) = K Exp (- (x - µ)2 / 2 σ 2)
Where K is a constant and Exp ( ) is the exponential function. The standard
normal distribution is given by
(4) P(z) = K1 Exp ( - z2 / 2)
where K1 is a (different) constant.
Now, any value of x can be located in its relationship to µ in the form

of the number of standard deviations distant from the mean. For example, if
13.5
µ = 32, σ = 9 a score of 45. 5 is exactly 13.5 units above the mean or ,
9
i.e 1.5 standard deviation units above the mean. A score of 18.5 is 13.5 units
below the mean, or 1.5 standard deviation units below the mean. This can be
symbolized as 1.5 for the score 45.5 and as - 1.5 for the score 18.5. These
scores are called standard scores and referred to as z – value or z scores.
Note that the values of x of any normal distribution can be related to

standard scores by the following formula
x−µ
(5) z=
σ
Example:7: Determine standard scores for x = 18-3, 27-9, 43-4, 39.3 in the
normal distribution for which µ = 30.1, and σ = 2.4.
Solution :
18.3 − 30.1
For x =18.3, z = = − 4.92
2 .4
27.9 − 30.1
if x = 27.9, z = = − 0.92
2 .4
34.4 − 30.1
if x = 34.4 z = =1.79
2 .4
14
39.3 − 30.1
if x = 39.3, z= = 3.83
2 .4
Example:8: Determine the values of x in the distribution of example 1 for

which the standard scores are z = - 3.07, - 1.04, 0.73, and 2.44.
Solution: This is the reverse of Example 1. Here standard scores are given
and we want to determine the actual values of x. We use the formula
(6) x = µ + z σ.
For z = - 3.07, x = 30.1 + 3.07 × 2.4 = 37.668

For z = 2.44, x = 30.1 – 2.44 × 2.4 = 24.344
Similarly, compute the other two values.
Example:9: In a normal distribution, a value of 42.1 is 1.3 standard

deviations above the mean with value 31.7. What is the standard deviation
of the distribution?
Solution: In this case, we know x, z, and m, but not σ. Since
x−µ x −µ
z= , σ=
σ z
Since x = 42 .1, z = 1 .3 and µ = 31.7
42.1 − 31.7
σ= = 8 .0
1 .3
Example:10: A normal distribution has a standard deviation of 1.7 if a value

of 11.3 lies 2.1 standard deviations below the mean. Determine the mean of
the distribution
Solution: Do it yourself. (Ans: 14.9)
The fact that any normal – distribution can be related to the standard
normal distribution is of central importance. Because of this, the standard
normal distribution can be studied in detail and the results transferred to any
normal distribution. Table of cumulative values of the standard normal
distribution is usually given at the end of every statistics book.
Now in this table, the entries on the left and top correspond to
the values of z and the decimal value, usually given to two decimal places.
15
The integer part and the first decimal value are given in the column at
the left, and the second decimal value in the top row. The entries in the body
of the table are the areas under the normal curve, between the mean (0) and
the given value z, correct to four decimal places. (Caution: In some books
the area is given from - ∞ to the given value of z).
For instance, if z = 1.62, to find the corresponding area, look down the
left column to find 1.6 and look along the top row to find 0.02. Then, the
entry which is in both the row of 1.6 and the column of 0.2 is 0.4474. This
means that the area under the standard normal curve between 0 and 1.62 is
0.4474. This gives the value of the probability P( 0 ≤ z ≤ 1.62).
Several other things follow immediately from this observation. Since

the normal curve is symmetric about the Y - axis, each side from the mean
contains exactly half of the area of the curve. Thus the area under the curve
to the right of 1.62 is 0.5000 – 0.4474 or 0.0526. This is the value of the
probability P(1.62 ≤ z).
In addition, since the curve is symmetric, the area between – z and 0 is

equal to the area between 0 and z. Therefore, in this case, the area between
–1.62 and 0 is also 0.4474 and the area to the left of – 1.62 is again 0.0526.
You should see the graphs of these from your Text Book or from
some good book.
Finally, the area between –1.62 and 1.62 is twice the area between 0
and 1.62 or 0.8948. This is interpreted as: the probability that the variable of
the standard normal distribution has a value between – 1.62 and 1.62 is equal
to 0.8948. This is illustrated by the following
Example:11: Find the area under the standard normal curve between 0 and –
z if z = 0.07, 0.83, 1.70, 2.56, – 0.24, - 1.12 , - 3.01
Solution:
16
z Area between 0 and z
0.07 0.0279
0.83 0.2967
1.70 0.4554
2.56 0.4948
-0.24 0.0948
-1.12 0.3686
-3.01 0.4984
Points to Remember:
1. The area under the curve is always positive.
2. The entries on the edge of the table (left and top) represent standard
deviations distance from the mean (standard scores).
3. The entries in the body of the table represent areas under the standard
normal curve between the mean and the given standard score (z –
value).
Example: 12: Find the standard scores for which the area under the standard
normal curve between it and the mean is 0.2019, 0.4908, 0.3621
Solution: From the table we can obtain the values of z
Area z
0.2019 0.53
0.4908 2.36
0.3621 1.09
We notice that for 0.4908, no entry in the table is precisely 0.4908.

The two entries close to it are 0.4906 and 0.4909. Since 0.4908 is closer to
0.4909 than 0.4906 we use the value for 0.4909.
Example:13: Find the area under the normal curve between z = -1.34 and z =
0.57, between z = 0.59 and z = 1.27
Solution: For z = -1.34 and z = 0.57, the values of the areas from the table
are respectively 0.4099 and 0.2157. Since these are on the opposite sides of
the mean, they should be added together. Thus the area between z = - 1.34
and z = 0.57 under the normal curve is 0.6256. For z = 0.59 and z = 1.27 the
corresponding areas are 0.2224 and 0.3980. Since they are on the same side
17
of mean, their difference is the desired area. Thus the area under the normal
curve between z = 0.59 and z = 1.27 is 0.1765.
Example:14: For a normal distribution with mean 38.7 and standard

deviation 10.2, estimate the probability that a value will fall between 29.6
and 44.8.
Solution: Each score should be converted to a standard score in the first
29.6 − 38.7
place. For the first value z= = − .89 and for the second
10.2
44.8 − 38.7
value z = = 0.60 . The corresponding areas under the standard Normal
10.2
curve are 0.3133 and 0.2257. Since the z – scores are on opposite sides of
the mean the areas most be added. Adding we obtain 0.5390. This is the
personality that a value will fall between 29.6 and 44.8. We express this
probability by P(29.6 ≤ x ≤ 44.8).
Example;15: A normal distribution has a mean of 13.3 with a standard

deviation of 21. Determine a number such that 80% of all the scores fall
within that number of the mean.
Solution: If 80% of all scores fall within the desired number of the mean
then 40% fall between the mean and that number (why?)
Let n be the desired number. This means 40% of the scores fall between the
mean, 13.3 and x. The z score corresponding to 0.4000 is 1.28. Thus, we
have
x − 133
1.28 = which gives x = 160
21
Thus 40% of the scores fall between 133 and 160, so that n = 27, and 80% of
all scores fall within 27 units from the mean on both sides; that is 80% of
all scores fall between 106 and 160.
EXERCISES
1. Estimate standard scores for the following values of x in a normal

distribution with mean 284.7 and standard deviation 14.6.
(a) X = 261.4 (d) X = 280.4

(b) X = 303.4 (e) X = 293.9
(c) X = 259.3 (f) X = 321.2
18
2. Find the values of x, for the following standard scores, in a normal
distribution with mean, m = 10.4 and variance, σ = 11.8
(a) z = 1.64 (d) z = 0.50

(b) z = 2.07 (e) z = - 0.13
(c) z = - 2.06 (f) z = 1.14
3. Find the area under the standard normal curve between
(a) z= 0 and z = 2.18

(b) z = - 1.04 and z = 1.54
(c) z = 1.56 and z = 2.93
(d) z = - 0.49 and z = - 0.12
4. Find the area under the standard normal curve

(a) To the right of z = 1.43
(b) To the left of z = -1.03
(c) To the right of z = -0.77
(d) To the left of z = 2.01
5. Find the area under the standard normal curve between z and – z if
(a) z=1
(b) z = 1.96
(c) z = 1.28
6. Find the value of z for which 0.1230 of the area under the standard
normal curve lies to the right of z.
7. The mean of a normal distribution is 100. If the probability that the
variable assumes a value greater than 121.0 is .1446, what is the
standard deviation of the distribution?
8. A normal distribution has a standard deviation of 134. The probability
that the variable takes a value less than 1072 is 0.7734. What is the
mean of the distribution?
19
2.4.2: Applications of the Normal Distributions
Normal distributions have a wide range of applications. Many

sets of data have distributions which are approximately normal. This is
evident from the following set of examples.
Example:16: Among workers in a certain industrial plant, the mean age is 45

years with a standard deviation of 4 years. One worker is stopped at random
and asked to fill out a questionnaire. What is the probability that he is
between 48 and 50 years of age.
Solution: If z1 denotes the standard score for 48 and z2 the standard score for
50, we have
48 − 45 50 − 45
z1 = = 0.75 z2 = =1.25
4 4
The areas under the normal curve associated with z1 and z2 are 0.2734 and
0.3944. The area between 48 and 50, then is 0.3944 – 0.2734 = 0.1210.
Thus P(48 ≤ x ≤ 50) = 0.1210, where x is the age of the worker.
Example:17: A machine in a factory is used to produce light bulbs. The

bulbs are examined in lots of 1000. On the average, a lot will have 10
defective bulbs. The distribution of the defective bulbs is approximately
normal with a standard deviation of 3.14. (i) What is the probability that a
certain lot will have at least 3 but not more than 6 defective bulbs? (ii) What
is the probability that it will have more than 15 defective bulbs?
Solution: Since the number of bulbs are discrete and measured in integers, a
continuity correction must be applied. (i) Since the interval representing 3 is
2.5 to 3.5 and the interval representing 6 is 5.5 to 6.5 the interval
representing at least 3, but not more than 6 is 2.5 to 6.5, since both 3 and 6
are included. Now, if z1 and z2 represent the standard scores for 2.5 and 6.5,
respectively, we have
2.5 − 10 6.5 − 10
z1 = = − 2.39 z 2 = = − 1.11
3.14 3.14
The corresponding areas are 0.4916 and 0.3665. So, the area between 2.5
and 6.5 then is 0.4916 – 0.3665 = 0.1251. Thus P (2.5 < x < 6.5) = 0.1251.
In terms of the original discrete data, P(1 ≤ x ≤ 3) = 0.1251.
(ii) To determine P(x > 15), note that 15 is represented by the interval 14.5
to 15.5. To be greater than 15 means, after application of the continuity
correction, greater than 15.5, since 15 is not included.
20
15.5 −10
Here z = =1.75 and the associated area is 0.4599. So, P( x > 15 ) =
3.14
0.0401 (why?).
Example:18: In a certain high rent district, the monthly rental for apartments
is approximately normally distributed with a mean of Rs 384.22 and a
standard deviation of 126.40. Above what value is the highest 30 percent of
the monthly rentals in this district?
Solution: Although the variable is discrete, its values are not given in
integers; so we do not apply the continuity correction. According to the table
20% of the values are between the mean and z, for z = 0.52. At this point, 30
percent of the values are above it.
x − 384.22
Thus, we have 0.52 = or x = 449.95.
126.40
Thus, about 30 percent of the rentals are above Rs. 499.95. A slightly more
accurate figure could be obtained with more detailed working.
EXERCISES
1. Weights of male students in a large university are approximately

normally distributed. Estimate the mean and standard deviation of the
distribution if 6.68% of the students weigh less than 125 pounds and
15.87% weigh more than 170 pounds.
2. The efficiency rating of certain machines is calculated everyday over
the year. Machine A’s ratings have mean of 0.873 with standard
deviation of 0.38, and machine B’s ratings have a mean of 0.846 with
a standard deviation 0.038 on a certain day. What is probability that
machine A will have a rating less than the mean for machine B? What
is the probability that machine B will have a rating greater than the
mean for machine A?
3. To test whether a process is in control, a reading is taken on its
working. If the process is in control, the daily readings have a mean of
832.4 with a standard deviation of 10.2. What is the probability of
getting a reading above 860.0 when the process is in control?
4. A survey organization regularly sends out questionnaires. The number
of replies on a mailing of 1,000 is approximately normally distributed
with a mean of 785 and a standard deviation of 41. If a mailing of
1000 questionnaires is made, what is the probability of receiving more
than 850 replies?
21
2.4.3: Normal Approximations
Normal distribution is one of the most important and useful of

probability distributions. The fact that many distributions approximate to
the normal distribution as the data increases is of prime importance. For
example, let us take the binomial distribution. As the number of trials
increases, it becomes more and more cumbersome to determine the
probability. For example 30 or more heads in 50 tosses of a coin would
require 21 separate calculations. Fortunately, as the number of trials
increases, and if the probability of a success is about 0.50, the normal
distribution is satisfactory as a continuous approximation to the binomial
distribution
.
For a very large number of trials, the probability of success can differ
substantially from 0.50. The accuracy of the experiment depends, of
course, on the number of trials and the actual probability involved. As a
rule of thumb, the approximation should not be used unless both n p and
n(1 – p) are greater than 0.5, where n is number of trails and p is the
probability of success.
The mean and standard deviation of binomial distribution are given by

n p and np (1− p ) . Let us consider an example to illustrate this point.
Example:19: What is the probability of obtaining 30 or more heads in 50

tosses of a coin.
Solution: This is a binomial experiment with n = 50 and p = 0.5. Its
distribution can be approximated by a normal distribution with m = 50 x
0.5 = 25 and σ = 50 * 0.5* 0.5 = 3.54. Now, 30 is represented on the
continuous distribution by the interval 29.5 to 30.5; since 30 is included,
we must determine the probability that greater than 29.5. The appropriate
standard score is
29.5 25
z= = 1.27
3.54
This corresponds to an area of 0.3980 between 29.5 and the mean. Since
we are interested in the probability of obtaining 30 or more, this
probability is 0.5000 – 0.3980 = 0.1020
Thus, P(x ≥ 30) = 0.1020.
22
2.5: Exponential Distributions
Exponential distributions play an important rote in describing a large

class of phenomena (which includes, the life length of a certain device,
life length of a certain species etc.).
A continuous type situation is said to follow an exponential

distribution with parameter m if the probability of the life – length (of a
device) is less than or equal to x time units, is given by, 1 − e − mx for all x
≥ 0. This value 1 − e − m x is denoted by F (x), which gives the probability
of the life – length of a device which is less than or equal to x time units.
Mathematically
−m x
(7) F ( x) =1 − e for x ≥ 0
Probability of the life – length greater than x time units is given by

1 – f (x) or 1 − 1 − e − mx  = e − mx from (7)
 
Refer your Textbook for the graph of an exponential distribution. The
number m is called the parameter of exponential distribution.
Note: 1. An exponential distribution is mainly used to describe the life

length of a certain device. This means that time units are used in an
exponential distribution. Some time units may be in terms of seconds,
minutes, hours, days, months, or years etc.
2. For different values of m we get different types of exponential
distributions. (can you visualize the graphs)
2.5.1: The Mean and Variance of an Exponential Distribution
The mean µ which is also known as the expected value is given

by 1/m (where m is the parameter) and the variance σ 2 of the
exponential distribution is given by 1/m2.
Note: Memory - less Property

Now, we come to an important aspect of exponential distributions. It
has the property of having “NO MEMORY”. This means that “suppose
23
an event A has not occurred during the first N repetitions. Then the
probability that it will not occur during the next M repetitions, is the
same as the probability of that it will not occur during the first M
repetitions”. In other words, the information of no successes is forgotten
so far as subsequent developments are concerned.
Example:20: Let the life – length of the fuses produced by a

manufacturing company be assumed to follow an exponential
distribution. There are two processes by which the fuses may be
manufactured. Process I yields an expected life – length of 100 hours,
while Process II yields an expected life – length of 150 hours. Suppose
Process II is twice as costly (per fuse) as Process I. Let the cost of the
fuse be Rs 5/ for Process I. Assume further, that if a fuse lasts less than
200 hours, a loss of Rs. 100/- is incurred by the manufacturer. Which
Process should be used?
Solution: Let us compute the expected cost for each process.
For Process I, the expected life – length µ = 100 hours. We know
µ = 1/m. Hence, m = 1 / 100 hours.
Cost per fuse = 5 if m > 200
= 5 + 100 if m≤ 200
Therefore,
Expected cost for Process I = (5) P(m>200) + (105) P(m ≤ 200)
= (5) Exp(- 2) + (105) [1 - Exp(- 2)]
= 91.466
By a similar computations, we get for the process II m = 1 / 150
Cost per fuse = 10 if m > 200
= 110 if m ≤ 200
Expected cost for Process II = (10) Exp( - 4/3) + (110) [1 - Exp ( - 4/3)]
= 83.64
The expected cost of Process I is, though, slightly more than that for the
Process II, still we prefer Process I as the cost per fuse for Process II is
double that for Process I. Hence we prefer, Process I.
Example:21: Suppose a variable x has the exponential distribution with

parameter 10. Compute the probability that it exceeds its mean.
24
Solution: The required probability is P(x > 1 / m). We have
P(x > 1 / m) = P(x > 1 / 10) = Exp (- 10 / 10) = Exp ( - 1) . Find this value
using your calculator.
EXERCISES
1. In example 18, what should be the cost of the fuse manufactured by

Process I, in order that the expected costs for both the Processes be equal.
2. In Example 18, what should be the cost of the fuse manufactured by
Process I, in order that the expected cost of Process I is smaller than that
for the Process II?
APPENDIX
In this appendix we are going to briefly recall about normal

distributions, standard normal variate, and 1σ, 2σ and 3σ levels.
1. Normal Distributions
Consider the equation
1  1 ( x − µ )2 
(1) y= exp  − 2


σ 2π  2 σ 
Here µ and σ are called parameters. (µ can take all real values but σ is
allowed to take only positive values). The above equation describes a curve
as shown in your Text Book. This curve is symmetric about x = µ . It can be
shown that the total area under this curve is 1, whatever be the values of µ
and σ. Thus, the curve (1) gives a probability distribution called the normal
distribution.
It can also be proved that mean = µ and variance = σ2 . Thus, the two
parameters µ and σ represent the mean and standard deviation of this
distribution.
25
2. Standard Normal Variate
The variable
x−m
(2) z=
σ
is called the standard normal variate (SNV) and has the distribution given
by
1  − z2 
(3) y= exp   .
2π  2 
This distribution is called the standard normal distribution (SND). It is

advantageous to work with this distribution because,
(i) it is unique; that is, there is only one such distribution; and
(ii) it has no parameters.
Note that the mean and standard deviation of this distribution are 0 and 1
respectively. A Diagram of this is shown below;
26
Note the following
1. When z lies between – 1.96 and 1.96, the corresponding area is .95.
We say that the probability of z lying between – 1.96 and 1.96 is 0.95
or p (−1.96 < z <1.96)= 0.95
2. Similarly, we have p (−1.64 < z <1.64)= 0.90 .
3. Similarly, we have p (− 2.56 < z < 2.56)= 0.99
3. Sigma Levels
1. Consider P ( - 1 ≤ z ≤ 1) = 0.683. Note that this implies that

P( µ - σ ≤ x ≤ µ + σ) = 0.683, and for any given µ and σ , this
interval ( µ − σ , µ + σ ) is known as 1σ - level or 1σ – limit.
2.. Similarly, note that, p (− 2 ≤ z ≤ + 2) = 0.954 ; this gives the 2σ – level or
2σ – limit.
3. Finally, note that, p (− 3 ≤ z ≤ + 3)= 0.997 ; this gives the 3σ – level or 3σ –
limit.
27
Chapter 3
POPULATION VS SAMPLING
3.1: Introduction
You go to a shop to buy 5 kgs of wheat. The shopkeeper shows you

the varieties of wheat present in his shop. You take a handful of wheat to
assess its quality and then decide to buy it or not. Without your knowledge,
you are picking up a ‘sample’ of wheat from the bag of wheat (‘population’).
A simple daily routine, which is performed in a casual way, leads to an
interesting concept in Statistics known as sampling.
Now, you might be interested in knowing what exactly is a
population, and a sample and how they are useful in daily applications and
most importantly in the theory of Statistics.
A population is a set of data which consists of all possible
values pertaining to a certain set of observations or context or investigation..
A sample is a small section of the population from which it is
drawn for the purpose of investigations.
Now, a question might arise: How is sample different from
sampling? The concept of drawing samples from population is known as
sampling. In other words, the process of sampling may be defined as a
procedure for selecting a number of individuals from a population in such a
way that any particular property of the sample will correspond as closely as
possible to the true property of the population.
Some of the examples for population are follows:

(1) The set of all male students in a particular university.
(2) The set of all adults (above 21 years of age) in India.
(3) The set of all bulbs, classified as defectives, produced by XYZ
Electricals in a day.
(4) The heights of all students in a particular class room.
(5) Marks obtained by the students in Statistics course who are registered
under a distance learning programme.
(6) The number of flyovers constructed by the DAZ Corporation in the
period 1996 – 2006.
1
Why should we at all sample the population? Why cannot we
consider the entire population for any observation or study? The
following reasons explain why it is necessary to do sampling.
(1) The size of population is an obvious first consideration. When the

population is infinite in size, sampling is essential. This is true in case
of bulk production of small items such as screws by an automatic
industrial process, or the population of a country like India. More
over, even if the population is of infinite size, it may be scattered over
a large area which may make the cost of enumeration probably large.
This is apparent in case of large scale aerial surveys in agriculture for
the estimation of crop yields.
(2) When the estimation of a characteristic in a population requires
destructive testing, sampling must be resorted. This is true in case of
testing the tensile strength of steel cables, measurement of life – time
of light bulbs etc.
(3) Time and cost factors also play an important role in sampling. With
the increase of size, time length increases along with the man power,
machinery, costs and so on.
There may be certain cases, where sampling techniques might not

work. In case of taking census, sample of any form would not give the
exact number, as by considering the total population.
But, for most of the practical purposes and statistical studies, sampling
is an essentially valid idea for the above said reasons.
3.2: Sampling Techniques
The choice of an appropriate sampling design is of very great

importance in the execution of a sample survey and is generally made,
keeping in view the objectives and scope of the enquiry and the type of
the population to be sampled. The sampling techniques can be broadly
classified into three categories as follows:
(i) purposive or subjective or judgment sampling
(ii) probability sampling
(iii) mixed sampling
2
3.2.1 Purposive Sampling
In this method, a desired number of sample units is selected

deliberately depending upon the object of the enquiry, so that only the
important items representing the true characteristics of the population are
included in the sample.
The main drawback of this sampling scheme is that it is highly

subjective in nature since the selection of the sample depends entirely on the
personal convenience, beliefs, biases and prejudices of the investigator. For
example, if in a socio - economic survey it is desired to study the standard of
living of the people in a big metropolitan city and if the investigator wants to
show that the standard has gone down, and then he may include individuals
in the samples only from the lower income group of the society, excluding
people from posh colonies. This scheme does not involve principles of
probability, and cannot be worked out for larger sample size.
3.2.2. Probability Sampling
Probability sampling provides a scientific and objective technique of

drawing samples from the population according to some laws of chance in
which each unit of the population has some definite pre – assigned
probability of being selected in the sample. Different types of sampling
under this scheme includes the following :
(i) Each sample unit has an equal chance of being selected;

(ii) Sampling units have varying probability of being selected;
(iii) Probability of selection of a unit is proportional to the sample size.
3.2.3: Mixed sampling
Sampling design in which the sample units are selected partly

according to some probability laws. This design has some fixed sampling
rule which generally makes use of chance factor.
3
Some of the important sampling schemes covered under this
above sampling techniques are:
(i) Simple random sampling

(ii) Stratified random sampling
(iii) Systematic sampling
(iv) Multistage sampling
(v) Area sampling
(vi) Simple cluster sampling
(vii) Quota sampling
Our prime concern will lie in the simple random sampling which is of
great importance in sample surveys.
3.3: Simple Random Sampling
Simple random sampling is the technique in which “sample is so

drawn that each and every unit in the population has an equal and
independent chance of being included in the sample”.
If the unit selected in any draw is not replaced in the population

before making the next draw, then it is known as ‘simple random sampling
without replacement’ and if it is replaced back before making the next draw
it is known as ‘simple ransom sampling with replacement’.
3.3.1: Selection of a Simple Random Sample
Proper care must be exercised to ensure that the sample drawn is

random and therefore representative of the population. A random sample
may be selected by
(1) Lottery Method; or

(2) Use of table of random numbers.
(1) Lottery Method: The simplest method of drawing a random sample is by

the lottery system. Every member or unit of the population is identified with
a district number which is recorded on a slip or a card. These slips should be
of same size, shape, colour etc to avoid human bias. If the population is
small, then these slips are put in a bag and thoroughly shuffled and then
samples are drawn one by one as per requirements. The slips are thoroughly
4
shuffled after each draw. The sampling units corresponding to the numbers
on the selected slips will constitute a random sample.
For example, let us suppose that we want to draw a random sample of

10 individuals from a population of 100 individuals of the population. Slips
satisfying the above listed norms are made and put in a bag and shuffled
thoroughly. After that, a sample of 10 slips is drawn one by one from the
bag. The individuals bearing the numbers of the selected slips will constitute
the desired sample.
The lottery method gives a sample, which is quite independent of the

properties of population. It is one of the best and most commonly used
methods of selecting random samples. It is quite frequently used in a the
random draw of prize, in the tambola games and so on.
(2) Use of Table of Random Numbers
The lottery method described above is quite time consuming and

cumbersome to use if the population to be sampled is sufficiently large.
Moreover, in this method it is not humanly possible to make all the slips or
cards exactly alike and as such, some bias is likely to get introduced. This
difficulty is avoided by considering the random sampling number series.
The most practical and inexpensive method of selecting a random

sample consists in the use of “Random Number Tables” which have been so
constructed that each of the digits 0,1,2,……..9 appears with approximately
the same frequency and independently of each other. If we have to select a
sample from a population of size 99, then the numbers in the table can be
considered two by two from 00 to 99.
The method of drawing a random sample comprises the following steps:

(i) Identifying N units in the population with the numbers 1 to N;
(ii) Select at random, any page of the ‘random number table’ and pick
up the numbers in any row, column or diagonal at random;
(iii) The population units corresponding to the numbers selected in the
above step constitutes the random sample.
5
A random number table is given below.
Table 1: Random Numbers
2952 6641 3992 9792 7979 5911 3170 5624

4167 9524 1545 1396 7203 5356 1300 2693
2370 7483 3408 2762 3563 1089 6913 7691
0560 5246 1112 6107 6008 8126 4233 8776
2754 9143 1405 9025 7002 6111 8816 6446
Example:1: Draw a random sample of 15 students from a class of 450

students.
Solution: First of all we identify the 450 students of the college with
numbers from 1 to 450 starting with the first number in the above table and
moving row wise we pick out one by one the three – digit numbers less than
or equal to 450. (We may also start from 00 and end with 449). This will
give 15 numbers which are less than or equal to 450. The numbers greater
than 450 are discarded.
The above numbers grouped in threes are:
295 266 413 992 979 279 795 911 317 056 244
167 952 415 451 396 720 353 561 300 269 323
707 483 340
Thus the students corresponding to the numbers
295 266 413 279 317 56 244 167

415 396 353 300 269 323 340
constitute the desired random sample of size 15.
Example:2: Use Table 1 to draw a random sample of size 5 from population

of 24 units.
Solution: First of all we identify the 24 units in the population with numbers
from 1 to 24. Then, in table 1, starting with the first number and moving row
wise we pick out the numbers in pairs, two by two, ignoring the numbers
which are greater than 24 and counting the repeated numbers only once till a
selection of 5 numbers below 25 is completed.
6
Thus, the numbers obtained are 11, 24, 15, 13, and 03. This gives a
sample of size 5.
Remark: In this method a large number of digits are rejected and thus we
need large tables even to draw small samples. Some times it is possible that
we may not be able to draw a sample exhausting all the numbers of the table.
This difficulty is overcome by assigning more than one number to each of
the sampling units. For instance, in Example 2 the first unit may be assigned
the numbers
1, 1 + 24, 1+2*24, 1+3*24, and so on
i.e 1, 25, 49, 73, 97, 121 and so on
Similarly the 2nd unit may be assigned the numbers
2, 269, 50, 74, 98, 122 and so on.
Finally, the last unit may be assigned
0, 24 48, 72, 96 and so on.
The General rule is that if N is the total number of units and if we

number the population from 1 to N, then each unit ‘x’ may be assigned the
numbers
x, x+N, x + 2N, x+3N------
And finally, for x=N, we get the numbers
N, N+N, N+2N-------
Following this procedure, the desired sample of size 5 will be given

by the units corresponding to the numbers 4 , 5, 15, 17, 18
No. of from table 1 No. of sampled unit

29 = 5 + 24 5
52 = 4 + 2 * 24 4
66 = 18 + 2 * 24 18
41 = 17 +24 17
39 = 15 + 24 15
There is nothing difficult in this method. Try to understand carefully step by

step.
7
Example:3: The following table of ten random numbers of 2 digits each is
provided to the field investigation.
Table 2
34 96 61 85 49
78 50 02 27 13
How should he use this table to make a random selection of 5 plots out of
35!
Solution: In this case we shall first identify the 40 plots with the numbers 1
to 35. In the above table there are only 3 numbers 34, 02, and 13 which are
less than 35 and accordingly we are not able to draw the desired sample
space of size 5 from this table. In this case we shall assign more than one
number to each of the sample units i.e. plots.
For example, the first plot will be assigned the numbers
01, 01 + 35, 01 + 70, ---------------
i.e 1, 36, 71, 106, --------------------
Similarly the second plot is assigned the numbers,
02, 02 + 35, 02 + 2 * 35, 02 + 3 * 35 ---------
i.e. 2, 37, 72, 107, 142 -----------
Finally, the last plot i.e. 35th plot can be assigned the numbers
0, 35, 70, 105, 140, -------------
If we select the first number form the Table 2 and move row wise, we get the
following table:
No. from table 2 No. of the sample plot

34 34
96 = 26 + 2 * 35 26
61 = 26 + 35 26
85 = 15 + 2 * 35 15
49 = 14 + 35 14
78 = 08 + 2 * 35 08
Thus, the plots of numbers 08, 14, 15, 26 and 34 constitute the
desired sample. (Note that repetitions have to be discarded).
8
3.4: Sample Mean & Sample Standard Deviation
Characteristics, such as mean and standard deviation, which

are descriptive of a total population, are called population
parameters. Characteristics descriptive of a sample, but not the
entire population, are called statistics.
One of the most important uses of statistical inference is to
estimate the degree to which sample statistics approximate the
population parameters.
3.4.1: Sample Mean

If the sample has a total of n members, x , the mean of the
sample is given by
1 n
∑ xi
n i =1
Note: If we are calculating the sample mean we denote it by x and
for population mean we use µ.
Thus mean of a sample containing ‘n’ members is given by
1 n
x= ∑ xi
n i =1
Example:4: A random sample of five accounts in department store
showed the following balances at the end of a month: Rs. 67.32,Rs.
108.97, Rs. 17.64, Rs. 412.11 and Rs. 81.96.
Compute the mean balance.
Solution: Since the sum of the five accounts is Rs. 688.00, the
mean is given by
688.00
x= =137.60 . Thus, the mean balance is Rs. 137.60.
5
9
Example:5: The ages of 25 people in a certain income bracket are
distributed as in the following frequency table:
Age 29 33 37 38 39 40 42 43 45 47 50 59 66
Frequency 1 1 3 4 2 3 2 2 3 1 1 1 1
Estimate the mean age of the sample.

Solution: The mean x is given by
29 ×1 + 33 ×1 + 37 × 3 + 38 × 4 + 39 × 2 + ....... + 66 ×1
x=
25
1050
= = 42
25
3.4.2: Sample Standard Deviation
The standard deviation of a sample, with mean x and

containing n members, is given by
∑ (x − x)
i −1
i
2
s=
n −1
Note: Here ‘s’ is used to indicate that it is the standard deviation

of a sample, x is the mean of the sample, whereas σ is used to
indicate standard deviation of a population, and µ the mean of
the population.
Example:6: Use the data of Example 1 to determine the standard

deviation of the balances.
Solution: As in calculating the standard deviation, a tabulation
is always useful.
10
x x - x ( x - x )2
67.32 - 70.28 4939.2784
108.97 - 28.63 819.6469 s=
98600.9066
17.64 - 119.96 14390.4016 4
(or )
412.11 274.51 75355.7401
s =157.00
81.96 - 55.64 3095.8096
688 98600.9066
Example:7: Use the data of Example 2 to calculate the standard

deviation of the ages.
Solution
x f x× f x - ( x - x )2 ( x - x )2 f
x
29 1 29 -13 169 169
33 1 33 -9 81 81
37 3 111 -5 25 75
38 4 152 -4 16 64
39 2 78 -3 9 18
40 3 120 -2 4 12
42 2 84 0 0 0
43 2 86 1 1 2
45 3 135 3 9 27
47 1 47 5 25 25
50 1 50 8 64 64
59 1 59 17 289 289
66 1 66 24 576 576
25 1050 1402
1402
Thus s= = 7.64
24
11
Some times it is not continent to use x - x for each measure.
Short cut formulas also utilize the mean.
Some of the short cut formulas for calculation of standard
deviation are given below.
n
∑x
i =1
2
− n (x)2
(1) s=
n −1
n n
n ∑ xi2 − (∑ x) 2
(2) s=
i =1 1=1
n ( n −1)
n
(∑ x i ) 2
i =1
∑x
1 =1
2
−
n
(3) s =
n −1
Example:8: Estimate the mean and standard devotion of the

following sample 20.6, 11.3, 13.7, 9.2, 18.1, 7.2.
Solution: Using the short cut formula we have

2
x x
20.6 400.36
11.3 121.09
13.7 187.69
9.2 84.64
18.1 327.61
7.2 51.84
80.1 1173.23
80.1
x= =13.35
6
12
( 80.1 ) 2
1173.23 −
6 109.335
s2 = = = 20.78
5 5
s = 20.78 = 4.56
EXERCISES
(1) A child needs an operation and the father surveyed a

sample of doctors in two cities to arrive at the mean cost
of operation, with the following results;
City A: 150, 175, 150, 200, 175 (all in Rs.)
City B: 250, 200, 175, 225, 200, 250, 200, 200
Find out the mean and standard deviation of each sample.
(2) Find the mean and standard deviation of the following

income data. Use class marks to the nearest Rupee as a
reasonable assumed mean.
Income (Rupees) No of families
42,000 – 44,999 13
39,000 – 41,999 43
36,000 – 38,999 20
33,000 – 35,999 25
30,000 – 32,999 50
27,000 – 29,999 67
24,000 – 26,999 105
21,000 – 23,999 267
18,000 – 20,999 394
15,000 – 17,999 214
12,000 – 14,999 347
9,000 - 11,999 412
6000 - 8999 516
3000 - 5999 714
13
0 - 2,999 324
(3) Use the best technique to determine the mean and standard
deviation of each of the following samples:
(1) 20, 22, 23, 26, 29, 30
(2) 28, 25, 20, 33, 27, 29, 23, 24, 21
3.5: Sampling Distributions
The basic problem of statistical inference is to infer information

about the population parameters from sample statistics with a stated degree
of accuracy. For instance, suppose we want to determine the mean number
of automobiles which pass through a certain corner between 8 am to 10 pm
for the purpose of deciding whether or not to build a service station on the
corner. We install traffic counters and take daily readings for 10 days. The
following results were obtained:
Day 1 2 3 4 5 6 7 8 9 10
No of 284 386 273 212 202 312 372 247 267 289
cars
These measures have a mean of 284.4. We may use this figure as an estimate
of the number of cars which pass through the corner each day.
However, there are number of hazards attached to such a case.
Suppose if we take a different sample (i.e. the experiment is repeated), it is
obvious that we would obtain a different mean. Statistical inference provides
us with a method of estimating the true mean to a desired degree of
accuracy. In order to put this method in to practice, we must study the
theoretical sampling distribution.
Consider the following example.

Example:9: There are six fish in an aquarium whose weights are 3.1. gm,
3.4 gm, 3.6 gm, 2.8 gm and 3.9 gm, We want to take a sample of three fish
to determine the mean weight. Since there are six fish, there are 6c3 or 20
different possible samples, each with a sample mean. These are shown in the
table given below;
14
Sample Weights X
1. 3.1, 3.4, 3.6 3.37
2. 3.1, 3.4, 2.8 3.10
3. 3.1, 3.4, 3.2 3.23
4. 3.1, 3.4, 3.9 3.47
5. 3.1, 3.6, 2.8 3.17
6. 3.1, 3.6, 3.2 3.30
7. 3.1, 3.6, 3.9 3.53
8. 3.1, 2.8, 3.2 3.03
9. 3.1, 2.8, 3.2 3.27
10. 3.1, 3.2, 3.9 3.40
11. 3.4, 3.6, 2.8 3.27
12. 3.4, 3.6, 3.2 3.40
13. 3.4, 3.6, 3.9 3.63
14. 3.4, 2.8, 3.2 3.13
15. 3.4, 2.8, 3.9 3.37
16. 3.4, 3.2, 3.9 3.50
17. 3.6, 2.8, 3.2 3.20
18. 3.6, 3.2, 3.9 3.57
19. 3.6, 2.8, 3.9 3.43
20. 2.8, 3.2, 3.9 3.30
The theoretical sampling distribution of the mean, then is the following
x p (x) x p( x )
3.03 1/20 3.37 2/20
3.10 1/20 3.40 2/20
3.13 1/20 3.43 1/20
3.71 1/20 3.47 1/20
3.20 1/20 3.50 1/20
3.23 1/20 3.53 1/20
3.27 2/20 3.57 1/20
3.30 2/20 3.63 1/20
15
The mean of this distribution µ * is found to be approximately 3.33.
The standard deviation σ * is 0.1586
x1 + x 2 + − − − − + x6
Also, the mean of the population = = 3.33
6
Note, that the mean of the sampling distribution ( µ *) is equal to the
population mean ( µ ).
The standard deviation of the sampling distribution usually called the

standard error of the mean is related to the standard deviation of the
population by the following formula.
Standard Error of the Mean
The set of all random samples of size ‘n’ drawn from a population of
size N with mean µ and standard deviation σ has the mean µ * = µ and
standard deviation σ * is given by,
σ N −n
σ ∗=
n N −1
N −n
Note 1: If N is very large or infinite the term can be taken as 1.
N −1
2: As a rule, if the sample is drawn from an infinite population or
constitute less than five percent of the population, then we have
σ
σ ∗=
n
although this should be used only for theoretically large population. In
practice this is generally used even for small populations.
Example:9: (contn’d) For example recall example 8. We have here N = 6

and n = 3
Standard deviation of the population (σ) = 0.35
σ
= 0.202
n
16
σ N −n
σ* (from the formula) =
n n −1
σ*(from actual distribution) = 0.158
(Since the sample is such a large proportion of the population the correction
factor must be used).
EXERCISES
1. Suppose that a population is infinite, with a standard error of

the mean for random samples of size n =
a) 100
b) 144
c) 10,000
d) 36
e) 128
f) 1024
2. A population of 15,000 has a mean of 28.7 with a standard deviation

of 48. A sample of 200 is taken what is the probability that the mean of the
sample is greater than 290 if
(a) No correction for finite population is made.

(b) A correction for finite population is made.
3.5.1: Binomial distribution as sampling distribution
Consider an infinite population. Suppose we draw ‘n’ members

from this population, one after the other, giving equal chance to all the
members present in the population. Let p be the proportion of members that
possess certain characteristic. Then sample has binomial distribution with
parameters n and p.
[Note: In the above, characteristics can be smoker or non-smoker; rich or
poor; good or bad; educated or uneducated etc in some population]
17
Example:10: A study has to be conducted to know the smoking habits of the
population in a town . Assuming that the chances of a person being a smoker
is 50%, find the sampling distribution of the total number of smokers, based
on a sample size of 4.
Solution: Given that the probability of a person being a smoker = ½. Also
n = 4.
The sampling distribution is given by
Table 1
X 0 1 2 3 4
Probability 1 1 3 1 1
16 4 8 4 16
[Calculated on the basis of the binomial distribution]. Suppose we want to

have an experimental sampling distribution. We choose, say, 25 times a set
of four people and count the number of smokers each time. When the study
was actually done, the following results were obtained.
Study No No of 14. 0
Smokers 15. 3
1. 2 16. 1
2. 3 17. 4
3. 2 18. 4
4. 2 19. 2
5. 4 20. 3
6. 1
7. 0 Study No No of
8. 3 Smokers
9. 2 21. 1
10. 1 22. 2
23. 1
Study No No of 24. 2
Smokers 25. 3
11. 3
12. 2
13. 1
18
The relative frequency table for x is
Table 2
x 0 1 2 3 4
Relative Frequency 2 6 8 6 3
25 25 25 25 25
which is an approximation to the theoretical sampling distribution. If the

number of samples becomes very large, the experimental sampling
distribution will approach the theoretical sampling distribution given in
Table 1. This is guaranteed by the following Result.
3.6: Central Limit Theorem
A powerful tool for dealing with problems in which the

sample size is very large (in statistics any sample size greater than
30 or 35 is considered sufficiently large) is the Central Limit
Theorem. We will state the theorem without proof and explain how
it is applicable in practical purposes.
Statement: The theoretical sampling distribution of the mean of all samples

of size n is approximately a normal distribution with mean µ and standard
deviation σ .
x −µ
z=
σ/ n
is approximated by the standard normal distribution. The approximation
becomes increasingly accurate as n becomes large.
Example:11: A large population is normally distributed with a mean of 50

and a standard deviation of 12. A random sample of size 36 is selected from
the population. What is the probability that the mean of the sample is greater
than 52.
Solution: Assuming that the population is sufficiently large and according to

the central limit theorem the sample means will be normally distributed with
mean 50 and standard deviation s (error) 12 / 36 .
19
52 − 50
The corresponding standard score is then given by z = = 1.00 . So, the
2
area under the normal curve to the right of z = 1.00 is equal to 0.5000 –
0.3413 = 0.1587. Thus, the probability that the sample mean will be greater
than 52 is 0.1587
Example:12: A sample of 100 pieces of copper tubing is examined for

defects. If the process is in control, there will be a mean of 3.000 defects per
tube, with a standard deviation of 0.400. If the sample contains a mean of
3.100 defects or more, the entire shipment of 1000 pieces, will be refused on
the assumption that the process is out of control. Assuming that the process
is in control and the shipment is representative, what is the probability of
refusing the shipment by mistake?
Solution: The shipment will be refused, if the mean number of defects in the
sample is greater than 3.100. If the mean of the shipment is actually 3.000
with a standard deviation of 0.400, the probability can be found with the aid
3.100 − 3.000
of standard score. z =
σ∗
Since 100 is 10% of 1000, it will be necessary to use the correction factor.
0.400 1000 − 1000 0.400 900

σ ∗= =
100 1000 − 1 10 999
(0.400) 30
σ ∗= = 0.038
(10) (31.61)
0.100
Thus z = = 2.63
0.038
And from table we get the probability as 0.5000 – 0.4957 or 0.0043.
Thus the sampling procedure seems to be a good one.
EXERCISES
1. The mean of a random sample of size 81 is used to estimate the mean

of large population with a known standard deviation of 16.00. What is
the probability that the error will be less than 3.20?
2. A random sample of 100 from a population of 1200 is used to
estimate the mean of the population. If it is reasonable to assume the
population is normally distributed with a standard deviation of 36.0,
what is the probability that the maximum error will be 4.5?
20
3. A random sample of size 64 is taken from a large normally distributed
population with mean 0.080 and standard deviation 0.004. What is the
probability that the maximum error is 0.001?
APPENDIX
POPULATION VERSUS SAMPLES
1. Population:
By a ‘population’ we mean the total collection of items or elements
that fall within the scope of a statistical investigation. This is also called the
‘universe of discourse’ or simply ‘universe’. The purpose of defining a
statistical population is to provide very explicit limit for the data collection
process and for the inferences and conclusion that may be drawn from the
study. Time and space limitations must be specified, and it should be clear
whether or not a particular element falls within or outside the universe. In
short, a population is a universal set.
2. Examples of populations:
1. Salaries being earned by people with a particular degree.

2. Number of customers in India who buy a particular soap.
3. Grades of all students doing a particular course.
4. Stock prices on a Stock Exchange of a particular company, over a
month.
5. Census survey of India.
6. Runs scores by a particular batsman in the tests played in the last one
year.
Parameters: In trying to decide what information about a population is

necessary for making a decision, we need to have certain numerical
characteristics which serve to describe or distinguish that population. These
numerical characteristics are referred to as parameters. of the population.
Examples:
1. The expectation or mean of a population. It is denoted by µ .
2. The standard deviation of a population. It is denoted by σ.
21
3. Sample
In most statistical studies, we need to make a decision regarding a

population. For example, a manufacturer of ready – made shirts needs to
know the neck sizes of adult males in order to manufacture shirts of a
particular collar – size. But most often, it is impossible or impractical (in
terms of time and / or expense) to look at the entire population. Hence, also,
it becomes not feasible to obtain the exact values of the parameters.
Under such circumstances, the characteristics of the population are

normally judged by collecting or observing only a limited or restricted
portion of the population called a sample. Thus, a set of observations, that is
taken from some population, for the purpose of obtaining information about
that population, is called a sample.
Examples:
1. Stock prices on a Stock Exchange, of a company, over a month, are a

sample of the stock prices of that company since its inception.
2. Runs scored by a particular batsman in tests, in the last one year, is a
sample of his scores since he started playing in tests.
CAUTION: What is population for on purpose may merely be a sample for

another.
Example: If one wants to determine the average height of students in a class
room, the students in that room would represent the population. But, if one
wants to estimate the average height of students in that college, the students
in the room would be a sample of the students in that college.
Consider the runs scored by a particular batsman in 25 innings; these

runs (data) constitute a sample and its size is 25. Thus, the size of a sample
is the number of items or data constituting this sample. Just as population
characteristics are called parameters, the sample characteristics are called
statistics. (This term should not be confused with the name of the discipline
– statistics). Thus a statistic is a measure describing a characteristic of a
sample.
22
Examples:
1. The mean of a sample. It is denoted by x .
2. The standard deviation of a sample. It is denoted by s.
CAUTION: consider a set of n data, x1, x2, …., xn. If this set is considered as
a population, then its mean and the standard deviation are given by
1 1
µ=
n
∑ xi , ο=
n
∑ (x − µ )2 ;
whereas, if this same set is considered as a sample, then its mean and
standard deviation are given by
1 1 −
x=
n
∑ xi , . s=
n −1
∑ ( x − x )2
Note that µ = x (ie, the mean is same whether the data are considered as a
population or a sample).
But, σ # s : i.e., the value of the standard deviation depends on the
interpretation of the data as population or sample.
4. Population Distribution versus Sampling Distribution
Roughly speaking, a distribution which involves the (fixed) sample

size is called a sampling distribution.
Examples: Binomial Distributions. Of course, there are other distributions

which have not been discussed in these notes like multinomial, negative –
Binomial, F and chi - square distributions; and student’s t – distribution.
Distribution like Poisson, Exponential and normal distributions do not

involve the sample size or are independent of the sample size. Such
distributions are called population distributions.
23
Remember:
1. The sampling distributions depend on the sample size and describe the
distribution of a sample statistic.
2. When the population is normally distributed with mean µ and
standard deviation σ, the sampling distribution of x is also normal
with mean µ and standard deviation σ / n where n is the (fixed)
sample size.
3. The sampling distribution of proportion has mean p and variance
p (1 – p) / n where p is the population proportion and n is the (fixed)
sample size.
$$$ ### $$$
24
Chapter 4
ESTIMATION
4.1 Introduction
In most of the statistical investigations the population parameters are

unknown, simply because, it is not possible or even advisable to study the
entire population. Still, for a meaningful investigation, we need to know
information about the parameters. This is achieved through estimates based
on samples drawn from the population.
Thus, instead of chasing after the actual values of the parameters, we

settle for the estimates with desired degree of precision. That this is a good
bargain will be evident after going through this Chapter.
There are two kinds of estimates that are usually used: 1. point
estimates and 2.imteval estimates. Estimates which specify a single value to
a population parameter are called point estimates; estimates which specify a
range of values in an interval are called interval estimates.
Despite the variety of methods for determining point estimates, these

estimates are not quite useful, because they do not permit any degree of
uncertainty about the estimate; for example, nobody would venture to assert
that there were 25,403 persons in an exhibition on a particular day. But one
would definitely be willing to state that the number of visitors on this
particular day was approximately between 25,000 and 26000. So unless one
has a lot of idle time and is willing to makes an effort, nobody will count
each and every visitor to the exhibition. All that one would say is that the
number of visitors was in the range, say, 25,000 and 26,000.
Thus, we see that interval estimation is more natural and easier to

compute. Further, we often hear people say that “ I am 90% sure that the
number of visitors is between 25,000 and 30,000”. This leads to the concept
of confidence interval. In statistics and particularly in estimation, we always
associate a certain measure of chance (or confidence) in the estimate
methods. What the above person means is that, if x denotes the number of
visitors on that particular day, then the chances are 0.9 that x lies between
25,000 and 30,000: or
p ( 25000 ≤ x ≤ 30,000) = 0.9
1
Because there are 90% chances that x lies between 25000 and 30000, this
interval (25,000, 30,000) is called a “90 percent confidence interval” for x.
This means, that, if we watch the number of visitors on 100 days, then on 90
of these days, the number would be between 25,000 and 30,000. (This also
means that on 10 of these 100 days, the number may not tall in this the
above limits).
A real situation would be, to have an interval with small range and
high confidence level. Unfortunately, this seldom happens in practice. We
will see that, if the interval is small, the confidence would be low and if the
confidence is to be high, then the corresponding interval would have to be
large.
4.2 Estimation of µ
As has already been pointed out, in any statistical investigation, we are

interested in obtaining information on the population parameters – especially
the mean µ and the standard deviation σ ( in the continuous situations), and
the proportion ( in the discrete situations).
In this Section, we will study methods of estimating µ . We make a

blanket assumption that “the population is normally distributed”. Now, it
may happen that the population standard deviation σ is known or unknown.
Accordingly, we have different methods.
4.2.1. Estimation of µ when σ is known
This is the simplest of all the cases; we illustrate this by an example.
Example;1: Twenty – five loan applications in a bank were randomly

selected for the purpose of determining the average amount requested for
each loan. Find a 95 percent confidence interval for µ assuming that the
sample mean x = Rs 900 and the standard deviation σ = Rs 140.
Solution: Since the sampling distribution of sample mean is normal with
mean µ and standard deviation σ / n
x−µ
z=
σ/ n
2
has standard normal distribution. We want to find the values a and b such
that.
p (a ≤ z ≤ b ) = 0.95
From the tables we get the values of a and b as:
a = - 1.96 and b = 1.96.
Thus we obtain the interval
−1.96 ≤ z ≤1.96 ;
x −µ
or −1.96 ≤ = ≤1.96
σ/ n
900 − µ
or −1.96 ≤ = ≤1.96
140 25
or −1.96 × 28 ≤ 900 µ ≤1.96 × 28
or 900 −1.96 × 28 ≤ µ ≤ 900 + 1.96 × 28
or 845.12 ≤ µ ≤ 954.88
This is the required 95 percent confidence interval. In other words,
p ( 845.12 ≤ µ ≤ 954.88 ) = 0.95
In general we have
σ σ
p ( x −1.96 ≤ µ ≤ x + 1.96 ) = 0.95
n n
Note: In the above example, if the experiment of drawing samples of fixed

size 25 was repeated, 95% of the times the interval ( x − 54.88, x + 54.88) will
contain the population mean µ .
3
4.2.2. Estimation of µ when σ in not known
When a population is known to have a normal distribution, but its

standard deviation is not known, then the sample standard deviation s is used
as an alternative to σ when the sample size is greater than 30.
Example: 2: To estimate the mileage of a certain model of a car, 40 cars of

that model were selected and tested on a fairly long run. The mean and
standard deviation of their mileages were 19.3 and 0.7 respectively.
Assuming that the mileage is normally distributed, is it safe to conclude,
with 95% confidence that the average mileage of this model lies between 18
and 21?
Solution: Here n = 40, x = 19.3 and s = 0.7. Here σ is not given and hence
is an unknown. So we use s in place of σ. Thus
s s
p ( x −1.96 ≤ µ ≤ x +1.96 ) = 0.95
n n
s
Now x −1.96 =19.08
n
s
x +1.96 =19.52
n
Thus, the 95% confidence interval is (19.08, 19.52).
But the given interval (18, 21) is larger than the above interval; hence, we
cannot say with 95% confidence that µ lies in (18,21).
4.2.3. Determining the sample size n
In some investigations, an upper limit for the error of the estimate has
to be fixed in advance and a suitable sample size is determined, so that the
error does not exceed this limit.
The following example illustrates this:
Example: 3: Experience with workmen in a certain industry indicates (that

the time required for a randomly selected) workman to complete a job is
normally distributed with a standard deviation of 12 minutes. how large a
sample is needed to estimate the mean of this distribution to within 3
minutes with 95% confidence?
4
Solution: If x is to estimate µ to within 3 minutes, then 1 x − µ 1≤ 3 and also
σ
p (1x − µ 1≤ 3) = 0.95 this mean that p (1.96 ≤ 3) = 0.95
n
Thus, the required sample size is given by

1.96 ×12
≤3
n
2
Or n ≥ 
1.96 ×12 

 3 
Or n ≥ 61.4656
Thus, any sample of size greater than or equal to 62 will do.
Note: Work out this problem the other way: that is take σ = 12 and n = 62
and find 1.96σ / n . This value must be less than or equal to 3. Satisfy
yourself.
EXERCISES
1. In a factory, the average height of a sample of 256 workers was 62

inches with a standard deviation of 2 inches.
(a) Compute 90 and 99 percent confidence intervals for the
average height of all workers in that factory. What do you
observe?
(b) Repeat part (a), if the sample size is 100.
2. The standard deviation of the incomes of the employees of ABC &
Co. is known to be Rs. 1200. How large a sample is needed to
determine the mean income if it is desired that the chances of the error
being more than Rs50 should be less than 5%.
3. A sample of the IQ of 40% of the students in a university has the
mean IQ of 120 with a standard deviation of 4. How many students
are enrolled in this university, if a 99% confidence interval for the
mean IQ of all students extends three – tenths of the standard
deviation on either side of the sample mean?
(Hint: Let N be the number of students in the university. Then
5
n = N x 40/100 = 2N/5. Given that p 1x − µ 1≤ s  = .99 . But, we also
3
 10 
 s  3 s
have p 1x − µ 1≤ 2.56  = .99 . Hence s = 2.56 . Solve for n to find
 n 10 n
N)
4.3: Estimation of Proportion – An Introduction
Suppose in a sample of size n, we observe that x number of items

possess a certain characteristic (being defective, being a smoker, having
x
deformities, favoring a candidate etc.). Then, the fraction is called the
n
proportion (based on the observation of x and sample of size n). For
example, in a survey of 49 voters, if 26 voters favored certain candidate,
then the proportion of voters favoring this candidate is 26/49.
In many investigations, such as opinion polls, health surveys, left-

handed habits, quality control etc, we need to have estimation of
proportions. The method adopted is usually the following one: This being
the discrete case, we assume that x is binomially distributed; hence its mean
will be np and variance np(1 – p). Thus, the proportion has mean p and
variance p (1 − p) / n , so that by the Central Limit Theorem, the variable z
given by
(1) z=
(x / n) − p
p (1 − p ) / n
has a standard normal distribution when n is greater than 30.
4.3.1 Estimation of Proportion
When the situation is of discrete type, we need to know estimation of

the proportion of ‘successes’, in order to make decisions. We illustrate this
idea by the following example.
Example:4: A manufacture wants to produce color TV sets. Before doing so,
he wants an estimate of proportion of TV set owners having color TVs, so
that he will have an idea of the available market. His sample of 100
randomly selected owners yielded 40 people possessing color sets. Let us
construct a 95% confidence interval for p.
Solution: Here x = 40 and n = 100 so that
6
(A) Since the denominator of equation (1) also involves p, which is not
known and to be estimated, this is not useful. However, our interest is
to find an estimate for p; so we replace the denominator by
1 x x
. 1 −  taking the value x / n as an approximation for p.
n n n
This gives z =  − p  /
x 1 x x
− 1 − 
n  n n n
Hence, we get z = (0.4 – p ) / 0.05. We know that
p (−1.96 ≤ z ≤1.96) = 0.95;
Thus, the required interval is
0.4 −1.96 × 0.05 ≤ p ≤ 0.4 +1.96 × 0.05
0.302 ≤ p ≤ 0.498
Hence, the TV manufacturer can be sure with 95% confidence that 30.2% to
49.8% of the population of his locality owns color sets. In other words, he
can be sure that at least 50% of the people do not own color sets (with 95%
confidence), so that it is profitable so start business.
EXERCISES
1. A company manufacturing a detergent powder finds that out of 196

women surveyed, 104 use its product. Find the 99% confidence
interval for the women who use this detergent powder.
2. YXZ Ltd has proposed a set of new condition for its workers. The
union of this company hires an agency to find out about how useful
will the new condition be. The agency selects a sample of 300
workers and finds that 33% favor the change. How accurate is this
estimate of the true proportion at 95% level?
3. A manufacturer of ball – bearings believes that approximately 2
percent of his product is defective. One of his customers wishes to
estimate the percentage to within 0.05 percent so that 97% of the
7
times he can be sure that the manufacturer is right. Will a sample size
of 81 work?
4.4: Small Sample Methods
We have already seen in 4.2.1, that if the population is normal wish

standard deviation σ (known), then an estimate of µ can be found by taking
samples.
Further, we saw in 4.2.2 that, even if σ is not known, we can estimate
µ providing that the sample size n is greater than 30.
In this section, we will see that, if σ is not known and also n is smaller
than 30 (samples whose sizes are less than 30 are called small samples), still
µ can be estimated by means of “ t – distribution”. The table for this
distribution is given at the end of your Text Book. The value n – 1 is denoted
by v and is called degrees of freedom of the t – distribution.
Example:5: A sleep inducing drug was given to 16 volunteers. The observed

data produced a mean increase in sleep of 30 minutes with standard
deviation of 17 minutes; find a 90 percent confidence interval for the mean
increase in sleep for the volunteers who participated.
Solution: It is given that x = 30, s =17 and n =16. Note that σ is not given
(therefore unknown) and n is less than 30. Since it is required to construct a
90% confidence interval, we take α = 0.90 . So that 1−α = 0.10 and
1
(1−α )= 0.05
2
From the table, we find that, for v = 15 and 0.05, the value of t is 1.761.
This means that the 90% interval is given by −1.761 < t <1.761 where
x −µ
t=
s/ n
30 − µ
=
17 / 4
30 − µ
Thus, the required interval is obtained as −1.761< 1.761
17 / 4
17 17
Or 30 − × < µ < 30 + ×1.761
4 4
Or 22.52 < µ < 37.48
8
Thus, on 90% of the occasion, the mean increase in sleep will be between
22.5 minutes and 37.5 minutes.
EXERCISES
1. 20 steel washers were tested for their diameters, giving x = 0.11 inches
and s = 0.002 inch. Find a 95% confidence interval for the true mean.
2. A health inspector tests 19 bottles of certain syrup for alcohol content,
and finds that the mean alcohol content is 2.7% with standard
deviation 0.13%. Find a 99% confidence interval for the true mean
content of alcohol.
$$$ ### $$$
9
Chapter 5
CORRELATION AND REGRESSION
5.1: Introduction
So far we have confined our discussion to the distributions involving

only one variable. Sometimes, in practical applications, we might come
across certain set of data, where each item of the set may comprise of the
values of two or more variables.
Suppose we have a set of 30 students in a class and we want to

measure the heights and weights of all the students. We observe that each
individual (unit) of the set assumes two values – one relating to the height
and the other to the weight. Such a distribution in which each individual or
unit of the set is made up of two values is called a bivariate distribution. The
following examples will illustrate clearly the meaning of bivariate
distribution.
(i) In a class of 60 students the series of marks obtained in two
subjects by all of them.
(ii) The series of sales revenue and advertising expenditure of two
companies in a particular year.
(iii) The series of ages of husbands and wives in a sample of selected
married couples.
Thus in a bivariate distribution, we are given a set of pairs of

observations, wherein each pair represents the values of two variables.
In a bivariate distribution, we are interested in finding a relationship

(if it exists) between the two variables under study.
The concept of ‘correlation’ is a statistical tool which studies the

relationship between two variables and Correlation Analysis involves
various methods and techniques used for studying and measuring the extent
of the relationship between the two variables.
“Two variables are said to be in correlation if the change in one of the

variables results in a change in the other variable”.
1
5.2: Types of Correlation
There are two important types of correlation. They are (1) Positive
and Negative correlation and (2) Linear and Non – Linear correlation.
5.2.1: Positive and Negative Correlation
If the values of the two variables deviate in the same direction i.e. if
an increase (or decrease) in the values of one variable results, on an average,
in a corresponding increase (or decrease) in the values of the other variable
the correlation is said to be positive.
Some examples of series of positive correlation are:
(i) Heights and weights;

(ii) Household income and expenditure;
(iii) Price and supply of commodities;
(iv) Amount of rainfall and yield of crops.
Correlation between two variables is said to be negative or inverse if

the variables deviate in opposite direction. That is, if the increase in the
variables deviate in opposite direction. That is, if increase (or decrease) in
the values of one variable results on an average, in corresponding decrease
(or increase) in the values of other variable.
Some examples of series of negative correlation are:
(i) Volume and pressure of perfect gas;

V
(ii) Current and resistance [keeping the voltage constant] ( R = ) ;
I
(iii) Price and demand of goods.
2
Graphs of Positive and Negative correlation:
Suppose we are given sets of data relating to heights and weights of

students in a class. They can be plotted on the coordinate plane using x –
axis to represent heights and y – axis to represent weights. The different
graphs shown below illustrate the different types of correlations.
x
x
x
x
x
x
x x
x
Figure for positive correlation
3
x
x
x x
x x
x
Figure for negative correlation
Note:
(i) If the points are very close to each other, a fairly good amount of
correlation can be expected between the two variables. On the
other hand if they are widely scattered a poor correlation can be
expected between them.
(ii) If the points are scattered and they reveal no upward or downward
trend as in the case of (d) then we say the variables are
uncorrelated.
(iii) If there is an upward trend rising from the lower left hand corner
and going upward to the upper right hand corner, the correlation
obtained from the graph is said to be positive. Also, if there is a
downward trend from the upper left hand corner the correlation
obtained is said to be negative.
(iv) The graphs shown above are generally termed as scatter
diagrams.
4
Example:1: The following are the heights and weights of 15 students of a
class. Draw a graph to indicate whether the correlation is negative or
positive.
Heights (cms) Weights (kgs)

170 65
172 66
181 69
157 55
150 51
168 63
166 61
175 75
177 72
165 64
163 61
152 52
161 60
173 70
175 72
Since the points are dense (close to each other) we can expect a high
degree of correlation between the series of heights and weights. Further,
since the points reveal an upward trend, the correlation is positive. Arrange
the data in increasing order of height and check that , as height increases, the
weight also increases, except for some (stray) cases..
EXERCISES
(1) A Company has just brought out an annual report in which the capital
investment and profits were given for the past few years. Find the
type of correlation (if it exists).
Capital Investment (crores) 10 16 18 24 36 48 57
Profits (lakhs) 12 14 13 18 26 38 62
5
(2) Try to construct more examples on the positive and negative
correlations.
(3) Construct the scattered diagram of the data given below and indicate
the type of correlation.
(Average Value in Lakhs of Rs.)
Years 1965 1970 1975 1980 1985 1990

Raw cotton import 42 60 112 98 118 132
Cotton manufacture exports 68 79 88 86 106 114
5.3: Linear and Non – Linear Correlation
The correlation between two variables is said to be linear if the

change of one unit in one variable result in the corresponding change in the
other variable over the entire range of values.
For example consider the following data.
X 2 4 6 8 10
Y 7 13 19 25 31
Thus, for a unit change in the value of x, there is a constant change in

the corresponding values of y and the above data can be expressed by the
relation
y = 3x +1
In general two variables x and y are said to be linearly related, if

there exists a relationship of the form
y = a + bx
where ‘a’ and ‘b’ are real numbers. This is nothing but a straight line when
plotted on a graph sheet with different values of x and y and for constant
values of a and b. Such relations generally occur in physical sciences but are
rarely encountered in economic and social sciences.
6
The relationship between two variables is said to be non – linear if
corresponding to a unit change in one variable, the other variable does not
change at a constant rate but changes at a fluctuating rate. In such cases, if
the data is plotted on a graph sheet we will not get a straight line curve. For
example, one may have a relation of the form
y = a + bx + cx2
or more general polynomial.
5.4: The Coefficient of Correlation
One of the most widely used statistics is the coefficient of correlation

‘r’ which measures the degree of association between the two values of
related variables given in the data set. It takes values from + 1 to – 1. If two
sets or data have r = +1, they are said to be perfectly correlated positively if
r = -1 they are said to be perfectly correlated negatively; and if r = 0 they
are uncorrelated.
The coefficient of correlation ‘r’ is given by the formula
n∑ x y −∑ x∑ y
r=
(n ∑ x 2
)(
− ( ∑ x) 2 n ∑ y 2 − ( ∑ y ) 2 )
The following example illustrates this idea.
Example:2: A study was conducted to find whether there is any

relationship between the weight and blood pressure of an individual.
The following set of data was arrived at from a clinical study. Let us
determine the coefficient of correlation for this set of data. The first
column represents the serial number and the second and third columns
represent the weight and blood pressure of each patient.
7
S. No. Weight Blood Pressure
1. 78 140
2. 86 160
3. 72 134
4. 82 144
5. 80 180
6. 86 176
7. 84 174
8. 89 178
9. 68 128
10. 71 132
Solution:
x y x2 y2 xy
78 140 6084 19600 10920
86 160 7396 25600 13760
72 134 5184 17956 9648
82 144 6724 20736 11808
80 180 6400 32400 14400
86 176 7396 30976 15136
84 174 7056 30276 14616
89 178 7921 31684 15842
68 128 4624 16384 8704
71 132 5041 17424 9372
796 1546 63,776 243036 1242069
Then
10 (124206 ) − ( 796 ) (1546 )

r=
[ (10) 63776 − ( 796) ][(10) (243036) − (1546) ]
2 2
8
11444
=
(1144) ( 40244)
= 0.5966
5.4: Rank Correlation
Data which are arranged in numerical order, usually from largest to

smallest and numbered 1,2,3 ---- are said to be in ranks or ranked data..
These ranks prove useful at certain times when two or more values of one
variable are the same. The coefficient of correlation for such type of data is
given by Spearman rank difference correlation coefficient and is denoted
by R.
In order to calculate R, we arrange data in ranks computing the
difference in rank ‘d’ for each pair. The following example will explain the
usefulness of R. R is given by the formula
(∑d 2)
R =1 − 6
n ( n 2 −1)
Example:3: The data given below are obtained from student records.
Calculate the rank correlation coefficient ‘R’ for the data.
Subject Grade Point Average (x) Graduate Record exam score (y)
1. 8.3 2300
2. 8.6 2250
3. 9.2 2380
4. 9.8 2400
5. 8.0 2000
6. 7.8 2100
7. 9.4 2360
8. 9.0 2350
9. 7.2 2000
10. 8.6 2260
Note that in the G. P. A. column we have two students having a grade

point average of 8.6 also in G. R. E. score there is a tie for 2000.
9
Now we first arrange the data in descending order and then rank
1,2,3,---- 10 accordingly. In case of a tie, the rank of each tied value is the
mean of all positions they occupy. In x, for instance, 8.6 occupy ranks 5 and
5+ 6
6. So each has a rank = 5 .5 ;
2
Similarly in ‘y’ 2000 occupies ranks 9 and 10, so each has rank
9 +10
= 9 .5 .
2
6∑d2
Now we come back to our formula R = 1 −
n ( n 2 −1)
We compute ‘d’ , square it and substitute its value in the formula.
Subject x y Rank of x Rank of y d d2

1. 8.3 2300 7 5 2 4
2. 8.6 2250 5.5 7 -1.5 2.25
3. 9.2 2380 3 2 1 1
4. 9.8 2400 1 1 0 0
5. 8.0 2000 8 9.5 -1.5 2.25
6. 7.8 2100 9 8 1 1
7. 9.4 2360 2 3 -1 1
8. 9.0 2350 4 4 0 0
9. 7.2 2000 10 9.5 0.5 0.25
10. 8.6 2260 5.5 6 -0.5 0.25
So here, n = 10, sum of d2 = 12. So
6 (12)
R =1 −
10 (100 −1)
=1− 0.0727 = 0.9273
Note: If we are provided with only ranks without giving the values of x and
y we can still find Spearman rank difference correlation R by taking the
difference of the ranks and proceeding in the above shown manner.
10
EXERCISES
1. A horse owner is investigating the relationship between weight

carried and the finish position of several horses in his stable.
Calculate r and R for the data given
Weight Carried Position Finished
110 2
113 6
120 3
115 4
110 6
115 5
117 4
123 2
106 1
108 4
110 1
110 3
2. The top and bottom number which may appear on a die are as
follows
Top 1 2 3 4 5 6
bottom 5 6 4 3 1 2
Calculate r and R for these values. Are the results surprising?
3. The ranks of two sets of variables (Heights and Weights) are given
below. Calculate the Spearman rank difference correlation
coefficient R.
1 2 3 4 5 6 7 8 9 10
Heights 2 6 8 4 7 4 9.5 4 1 9.5
Weights 9 1 9 4 5 9 2 7 6 3
11
5.5: Regression
If two variables are significantly correlated, and if there is some

theoretical basis for doing so, it is possible to predict values of one variable
from the other. This observation leads to a very important concept known as
‘Regression Analysis’.
Regression analysis, in general sense, means the estimation or

prediction of the unknown value of one variable from the known value of the
other variable. It is one of the most important statistical tools which is
extensively used in almost all sciences – Natural, Social and Physical. It is
specially used in business and economics to study the relationship between
two or more variables that are related causally and for the estimation of
demand and supply graphs, cost functions, production and consumption
functions and so on.
Prediction or estimation is one of the major problems in almost all the

spheres of human activity. The estimation or prediction of future production,
consumption, prices, investments, sales, profits, income etc. are of very great
importance to business professionals. Similarly, population estimates and
population projections, GNP, Revenue and Expenditure etc. are
indispensable for economists and efficient planning of an economy.
Regression analysis was explained by M. M. Blair as follows:

“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”
5.5.1: Regression Equation
Suppose we have a sample of size ‘n’ and it has two sets of measures,
denoted by x and y. We can predict the values of ‘y’ given the values of ‘x’
by using the equation, called the REGRESSION EQUATION.
y* = a + bx
where the coefficients a and b are given by
12
n ∑ xy − ( ∑ x ) ( ∑ y )
b=
n ( ∑ x 2 ) − ( ∑ x)2
∑ y−b∑ x
a=
n
The symbol y* refers to the predicted value of y from a given value of

x from the regression equation.
Example: 4 : Scores made by students in a statistics class in the mid -

term and final examination are given here. Develop a regression equation
which may be used to predict final examination scores from the mid – term
score.
STUDENT MID – TERM FINAL

1. 98 90
2. 66 74
3. 100 98
4. 96 88
5. 88 80
6. 45 62
7. 76 78
8. 60 74
9. 74 86
10. 82 80
Solution:
We want to predict the final exam scores from the mid term scores. So
let us designate ‘y’ for the final exam scores and ‘x’ for the mid – term exam
scores. We open the following table for the calculations.
13
Stud x y X2 xy
1 98 90 9604 8820
2 66 74 4356 4884
3 100 98 10,000 9800
4 96 88 9216 8448
5 88 80 7744 7040
6 45 62 2025 2790
7 76 78 5776 5928
8 60 74 3600 4440
9 74 86 5476 6364
10 82 80 6724 6560
Total 785 810 64,521 65,071
Numerator of b = 10 * 65,071 – 785 * 810 = 6,50,710 – 6,35,850 = 14,860

Denominator of b = 10 * 64, 521 – (785)2 = 6,45,210 – 6,16,225 = 28,985
Therefore, b = 14,860 / 28,985 = 0.5127
Numerator of a = 810 – 785 * 0.5127 = 810 – 402.4695 = 407.5305

Denominator of a = 10
Therefore a = 40.7531
Thus , the regression equation is given by
y* = 40.7531 + (0.5127) x
We can use this to find the projected or estimated final scores of the
students.
For example, for the midterm score of 50 the projected final score is
y* = 40.7531 + (0.5127) 50 = 40.7531 + 25.635 = 66.3881
which is a quite a good estimation.
To give another example, consider the midterm score of 70. Then the
projected final score is
y* = 40.7531 + (0.5127) 70 = 40.7531 + 35.889 = 76.6421,
which is again a very good estimation.
14
This brings us to the end of this chapter. We close with some problems for
you.
EXERCISES
1. The data given below are obtained from student records. Calculate the
regression equation and compute the estimated GRE scores for GPA = 7.5,
8.5..
Subject Grade Point Average (x) Graduate Record exam score (y)
11. 8.3 2300
12. 8.6 2250
13. 9.2 2380
14. 9.8 2400
15. 8.0 2000
16. 7.8 2100
17. 9.4 2360
18. 9.0 2350
19. 7.2 2000
20. 8.6 2260
2. A study was conducted to find whether there is any relationship between

the weight and blood pressure of an individual. The following set of data
was arrived at from a clinical study.
S. No. Weight Blood Pressure

1. 78 140
2. 86 160
3. 72 134
4. 82 144
5. 80 180
6. 86 176
7. 84 174
8. 89 178
9. 68 128
10. 71 132
15
3. A horse was subject to the test of how many minutes it takes to reach a
point from the starting point. The horse was made to carry luggage of
various weights on 10 trials.. The data collected are presented below in the
table.
Trial No. Weight (in Kgs) Time taken (in mins)

1 11 13
2 23 22
3 16 16
4 32 47
5 12 13
6 28 39
7 29 43
8 19 21
9 25 32
10 20 22
Find the regression equation between the load and the time taken to reach
the goal. Estimate the time taken for the loads of 35 Kgs , 23 Kgs, and 9
Kgs. Are the answers in agrrement with your intuitive feelings? Justify.
### $$$ ###
16

Probability & Statistics PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Probability & Statistics PDF

Caricato da

Copyright:

Formati disponibili

Study Material

Distance Learning Programmes Division

Word Processing & Typesetting by

JOINT DISTRIBUTIONS – TWO AND HIGHER DIMENSIONAL

The Sampling Distribution of the Sample Mean X . 117

Inferences Concerning Means 128

Let E be the random experiment:

Let E be the random experiment:

(1,1), (1,2), (1,3),− − − − −−, (1,6)

Let E be the random experiment:

Count the number of machines produced by a factory until a defective machine is

Let E be the random experiment:

Count the life length of a bulb produced by a factory.

An event is a subset of the sample space.

A manufacturer of small motors is concerned with three major types of defects. If A is

(a) get the one that is defective

Let E be a random experiment. Suppose to each event A, we associate a real number

We call P(A) the probability of the event A.

Some elementary consequences of the Axioms

2. If A1 , A2 ,..., An are any n pair-wise mutually exclusive events, then

5. Probability is a monotone set function:

6. If A, B are any two events, A B

7. If A, B, C are any three events,

P(A ∪ B ∪ C ) = P(A) + P(B) + P(C) − P(A ∩ B) − P(B ∩ C) − P(C ∩ A) + P(A ∩ B ∩ C) .

8. If A1 , A2 ,..., An are any n events.

Let E be a random experiment having only a finite number of outcomes.

If S = {a1 , a 2 ,..., a n } ( a1 , a 2 ,..., a n are equally likely outcomes), S = {a 1 } ∪ {a 2 }.......{a n }.a

Hence P ( S ) = P ({a1 }) + P{a 2 } − − − P ({a n })

But P({a1})=P({a2})= …= P({an}) = p (say)

Hence 1 = p+ p+ . . . +p (n terms) or p = 1/n

Hence if A is a subset consisting of ‘k’ of these outcomes,

k No. of favorable outcomes

(a) P(A ′) = 1 − 0.29 = 0.71

P(A) = 0.35, P(B) = 0.73, P (A ∩ B) = 0.14 . Find

(a) P (A ∪ B) = P(A) + P(B) - P( A ∩ B) = 0.94.

A, B, C are 3 mutually exclusive events. Is this assignment of probabilities possible?

P(A) = 0.3, P(B) = 0.4, P(C) = 0.5

20% read A 8% read A and B 2% read all

Find probability that an adult chosen at random reads

(b) reads exactly one paper. A B

Let, A, B be two events. Suppose P(B) ≠ 0. The conditional probability of A occurring

Hence we get the multiplication theorem

P(A ∩ B) = P(A).P(B/A) (if P(A) ≠ 0) )

= P(B).P(A/B) (if P(B) ≠ 0)

Definition: We say two events A, B are independent if P(A ∩ B) = P(A). P(B)

Equivalently A and B are independent if P(B | A) = P(B) or P(A | B) = P(A)

P(B) = P(A ∩ B) + P(A ′ ∩ B)

P(A ′ ∩ B) = P(B) - P(A ∩ B)

∴A, B′ are also independent.

Find the probability of getting 8 heads in a row in 8 tosses of a fair coin.

Let B be the event that the article is defective.

P(B) = P(D ∪ E) where D is the event it has type one defect

= P(D) + P(E) – P(D ∩ E) = 0.1 + 0.05 - (0.1) (0.05) = 0.145

P(A ∩ B) = P (article is having exactly one type of defect)

An electronic system has 2 subsystems A and B. It is known that

P (B fails alone) = 0.15

P (A and B fail) = 0.15

Find (a) P (A fails | B has failed)

(b) P (A fails alone) = P (A fails) – P (A and B fail) = 0.02-0.15 = 0.05

Ans 1- P (forming a correct no.) = 1 – (1-p)n .

Theorem on Total Probability

P(A) = P(A ∩ B1 ) + P(A ∩ B 2 ). + ..... + P(A ∩ B n )

(For a proof, see your text book.)