Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This material is provided for the educational use of students in CSE2400 at FIT. No further use or reproduction is permitted. Copyright G.A.Marin, 2008, All rights reserved.
1-1
Motivation: Here are some things Ive done with prob/stat in my career:
reserves. Helped Texaco analyze the bidding process for offshore oil leases. Helped US Navy search for submarines and develop ASW tactics. Helped US Navy evaluate capabilities of future forces. Helped IBM analyze performance of computer networks in health, airline reservation, banking, and other industries. Evaluated new networking technologies such as IBMs Token Ring, ATM, Frame Relay Analyzed meteor burst communications for US government (and weather and earthquakes...) Applied Statistics: Probability
1-2
Motivation:
I never could have imagined the opportunities I would have
to use this material when I took a course like this one. You cant either. Mostly I cannot teach about all of these problem areas because you must master the basics (this course) first. It is the key. This takes steady work. Have in mind about 6-8 hours per week (beginning this week!).
If you begin now and learn as you go, youll likely succeed. If
you dont do anything, say, until the night before a test or the day before an assignment is due, you will not be likely to succeed. Now is the time to commit.
1-3
and paper). For example, write: slide 7 geom series?? Writing notes/questions actually helps you pay attention. As soon after class as possible review the slides covered in class and your class notes. Do not go to the next slide until you understand everything on the current slide.
Cant understand something? Ask next class period or come to my office hours. In the words of Jerry McGuire: Help me to help you. Answer all points you raised in your notes. Look things up on the web.
covered. The syllabus gives you a good indication of where to look. Work a few homework problems each night (when homework is
assigned).
Cant work one or two of these problems? Come to my office hours. Help me to help you. It is much, much worse to come two days before 15 problems are due and say I cant do any of these. I will help you with some of the work, but its like trying to learn a musical instrument just by watching me play it.) Applied Statistics: Probability
1-4
Introduction to Mathcad
Mathcad is computational software by Mathsoft Engineering and
Education, Inc. It includes a computational engine, a word processor, and graphing tools. You enter equations almost as you would write them on paper. They evaluate instantly. You MUST use Mathcad for all homework in this class. Send your mathcad worksheet to gmarin@fit.edu BEFORE class time on the due date. I could not get a Mathcad terminal is NO EXCUSE.
Mathcad is available (at no charge) in Olin EC 272. Tutorial and help are included with the software; resources are
Name the worksheet HomeworknFirstnameLastname.xmcd The letter n represents the number of the assignment. The type .xmcd is assigned automatically. If you do not follow the instructions, you will receive a zero for the homework assignment.
also available at mathcad.com. Mathcad OVERVIEW is next. Note: Homework 1 is on the web site.
1-5
1 = n 1 = n + 1
i =1 n i =0 n i =1 i =0
1 = n
i =0 n-1 i =0
n-1
9 = 900.
i =1
100
1-6
Young Gauss reasoned that if he wrote two of the desired sums as follows: S=1 + 2 + 3 + + 99 + 100 S=100+ 99 + 98 + + 1 + 1 then clearly
100
100(101) = 5, 050. 2
k =1
1-7
p
i=0
1-p n +1 By division we know that = 1 + p + p 2 + ... + p n = S n . 1 p 1-p n +1 By definition the sum of the infinite geometric series is lim S n = lim . n n 1 p 1 If p < 1, then the above limit is . 1 p
1-8
From calculus we know that we cannot obtain a finite sum from this series and that it "diverges" to . However, if we multiply each term 1 by z we have the "geometric" series z = provided z < 1. 1 z k =0 Take the derivative of boths sides to obtain the Gauss series
k k
kz
k=1
k 1
(1 z )
, for z < 1.
1-9
Try these
1 ( a ) i =0 2
10 i
1 (b) i =2 2
10
1 (c ) i =0 2
1 ( d ) i =5 2
1-10
Power Series
Definition: A power series is a function of a variable (say z ) and has the form P( z ) = ci ( z a ) , where the ci are constants and a is also a constant. In this
i i =0
form the series is said to be "centered" at the value a. Convergence: One of the following is true: Either (1) the power series converges only for z = a, Or (2) the power series converges absolutely for all z , Or (3) there is a positive number R such that the series converges absolutely for z a < R and diverges for z a > R. R is called the radius of convergence.
1-11
Ratio Test (Good tool for finding radius of convergence.) Color red means not on test!
Consider the series
a
k=0
ak +1 ak
If <1, then the series converges absolutely. If >1 or = , the series diverges.
1 x Example: Determine where the series converges. k =1 k 3 1 x k +1 3 1 x k 3
k k k
The ratio
ak +1 ak
thus, =
Binomial sum:
Geometric sum:
1 z n +1 z = 1 z k =0
n k
1-13
3 3 3 3 3 ( x + y ) = x3 + x 2 y + xy 2 + y 3 = x 3 + 3x 2 y + 3 xy 2 + y 3 0 1 2 3
n n! n k n(n 1) (n k + 1) where = = = k ! ! ! k n k k k (k 1) 1 ( )
1-14
Logarithms
log10 a . Conversion formula: log b a = log10 b Any base may be used in place of 10.
Applied Statistics: Probability 1-16
1-17
Log(50) Base x
1-18
Logarithmic Identities
1-19
You may understand a concept but be unable to write it down. You may not be able to read notation that represents a simple concept. The only way to tackle this language barrier is through repetition until the notation begins to look friendly. You must try writing results yourself (begin with blank paper) until you can recall the notation that expresses the concepts.
Copy theorems from class or from the text until you can rewrite them without looking at the material and you understand what you are writing.
mastered. If you begin tonight, and spend time after every class, you can learn this new language. If you wait until the week before the test, it will be much like trying to learn Spanish, or French, or English in just a few nights (1 night??). It really cant be done.
1-20
Math sentences
Math problems, theorem, solutions ... must be written in sentences that make sense when you read them (EVEN WHEN EQUATIONS ARE USED). You will notice that I am careful to do this to the greatest extent possible on these slides (even knowning that I will explain them in class). My observation is that most students have no idea how to do this. I often see solutions like the following on homework or test papers: Evaluate x 2 dx. The student writes something like
0 4
= x2 x3 = 3 64 = . 3
The answer is right BUT every step is nonsense. The = sign means is equal to. In the first step, we dont know to what the equality refers. The equality is simply wrong in the next two steps.
Instead write:
2 x dx = 0 4
x 3
3 4
64 64 0 = . 3 3
1-21
(n k + 1) = n k
n nk C ( n, k ) = C = = . k k! Example: Given the 5 letters a,b,c,d,e how many ways can we list 3 of the 5 when order is important? Answer: P35 =53 =5*4*3=60. Note that each choice of 3 letters (such as a,c,e) results in 6 different results: ace, aec, cae, cea, eac, eca... Example: Given the 5 letters above how many ways can we choose 3 of the 5 when order is NOT important? 5 53 5*4*3 = 10. In this case we have the 60 that result when we care about order Answer: = = 3 3! 3*2*1 divided by 6 (the number of orderings of 3 fixed letters).
Applied Statistics: Probability 1-22
Definition
n k = n (n 1) (n k + 1) for any positive integer n and for integers k such that 1 k n. This symbol is pronouced "n to the k falling." Examples: 6 3 = 6 5 4 = 120. 36 is not defined (for our purposes). 55 = 5! n nk Again: P = n and = . k k!
n k k
1-23
Example: Suppose we have 2 red buttons, 3 white buttons, and 4 blue buttons. How many different orderings (permutations) are there? 9! = 1260. 2!3!4!
Answer:
1-24
Try These
There are 12 marbles in an urn. 8 are white and 4 are red. The white marbles are numbered w1,w2,...,w8 and the red ones are numbered r1,r2,r3,r4. For (a) - (d): Without looking into the urn you draw out 5 marbles. (a) How many unique choices can you get if order matters? 12 5 = 95, 040 12 (b) How many unique choices can you get if order does not matter? = 792 5 (c) How many ways can you choose 3 white marbles and 2 red marbles if order matters? You will fill 5 "slots" by drawing. First determine which 5 two slots (positions) will be occupied by 2 red marbles: = 10. Next 2 multiply by orderings of 3 white and 2 red: 10i8 3 i4 2 = 40,320. (d) How many ways can you choose 3 white marbles and 2 red marbles if 8 4 order does not matter? = 336 3 2 (e) How many marbles must you draw to be sure of getting two red ones? 10
Applied Statistics: Probability 1-25
Complex Combinations
How many ways are there to create a full house (3-of-akind plus a pair) using a standard deck of 52 playing cards?
13 4 12 4 = 13i4i12i6 = 3, 744. 1 3 1 2
(choose denomination)x(choose 3 of 4 of given denomination)x(choose one of the remaining denominations)x(choose 2 of 4 of this second denomination).
This follows from the multiplication principle (Theorem 2.3.1 in text).
Applied Statistics: Probability 1-26
Try these
n n Suppose = . What is n ? 11 7 18 18 Suppose = . What is r ? r r 2
1-27
Examples*
Consider a machining operation in which a piece of sheet metal needs two identical diameter holes drilled and two identical size notches cut. We denote a drilling operation as d and a notching operation as n. In determining a schedule for a machine shop, we might be interested in the number of different possible sequences of the four operations. The number of possible sequences for two drilling operations and two notching operations is 4!
2!2!
=6
The six sequences are easily summarized: ddnn, dndn, dnnd, nddn, ndnd, nndd.
*Applied Statistics and Probability for Engineers, Douglas C. Montgomery, George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability 1-28
Example*
A printed circuit board has eight different locations in which a component can be placed. If five identical components are to be placed on the board, how many different designs are possible? Each design is a subset of the eight locations that are to contain the components. The number of possible designs is, therefore,
Sample Space
Definition: The totality of the possible
Countable The number of attempts until a message is transmitted successfully when the probability of success on any one attempt is p Continuous (We begin with the discrete The time (in seconds) until a lightbulb burns out
= {1, 2,3, 4,5, 6,...} =
+
cases.)
= {t
: t 0} , where
1-30
Events
Definition: An event is a collection of points from the
sample space. Example: the result of one throw of die is odd. We use sets to describe events.
From the die example let the set of "even" outcomes be E = {2, 4, 6} . Let the set of "odd" outcomes be O = {1,3,5} .
If
that contains only one point from the sample space. For the die example the simple events are S1 = {1} , S 2 = {2} ,..., S6 = {6} . Suppose we toss a coin until first Head appears. What are the simple events? Unless stated otherwise, ALL SUBSETS of a sample space are included as possible events. (Generally we will not be interested in most of these, and many events will have probability zero.)
1-31
either an automatic or standard transmission, premium or standard stereo, V6 or V8 engine, leather or cloth interior, and colors: red, blue, black, green, white.
Orders
transmission is successful.
1-33
Operations on Events
Because the sample space is a set, , and any event is a subset A , we form new events from existing events by using the usual set theory operations.
A B Both A and B occur. A B At least one of A or B occurs. A A does not occur. S A = S A S occurs and A does not occur. the empty set (a set that contains no elements). A B = A and B are "mutually exclusive." A B Every element of A is an element of B, or, if A occurs, B occurs. Review Venn diagrams (in text).
1-34
Example
Four bits are transmitted over a digital communications channel. Each bit is either distorted or received without distortion. Let Ai denote the event that the ith bit is distorted, i = 1, 2,3, 4. (a) Describe the sample space. (b) What is the event A1 ? (c) What is the event A1 A2? (c) What is the event A1 A2? (d) What is the event A1 ?
Applied Statistics: Probability 1-35
(a ) A (b) A B (c ) ( A B ) C (d )
( B C ) ( A B ) C
(e)
1-36
mutually exclusive if
A1 , A2 ,... is said to be
Ai Aj =
if i j
Ai = Aj if i = j.
Ai = . i A collection of events forms a partition of if they are mutually exclusive and collectively exhaustive. A collection of mutually exclusive events forms a partition of an event E if A = E. i
i
1-37
Partition of
An 1
An
A1 A2
The sets Ai are "events." No two of them intersect (mutually exclusive) and their union covers the entire sample space.
Applied Statistics: Probability
1-38
Probability measure
We use a probability measure to represent
the relative likelihood that a random event will occur. The probability of an event A is denoted P( A).
Axioms:
A 1. For every event A , P ( A ) 0. A 2. P ( ) = 1. A 3. If A and B are m utually exclusive, then P (A B )= P (A )+ P (B ). A 4. If the events A1 , A2 , ... are m utually exclusive, then P An = n =1
P(A
n =1
).
Applied Statistics: Probability 1-39
Theorem:
Given a sample space, , a "well-defined" collection of events, F, and a probability measure, P, defined on these events then the following hold: (a) P ( ) = 0. (b) P [ A] = 1 P A , A F. (c) P [ A B ] = P [ A] + P [ B ] P [ A B ] , A, B F. (d) A B P [ A] P [ B ] , A, B F.
You must know these and be able to use them to solve problems. Dont worry about proving them.
1-40
1-41
p ( x ) = 1.
i =1 i
1 If all of the outcomes have equal probability, then each p ( xi ) = ; thus, the n 1 probability of any particular outcome on the roll of a fair die is . 6 Suppose, however, we have a biased die and the probability of a 4 is 3 times more likely than the probability of any other outcome. This implies that p (1) = p ( 2 ) = p ( 3) = p ( 5 ) = p ( 6 ) = a (for example) and p ( 4 ) = 3a. 1 1 3 It follows that 8a = 1 a = . Thus, p ( i ) = , i 4, and p (4) = . 8 8 8
Applied Statistics: Probability 1-42
Exercises:
Suppose Bigg Fakir claims that by clairvoyance he can tell the numbers of four cards numbered 1 to 4 that are laid face down on a table. (He must choose all 4 cards before turning any over.) If he has no special powers and guesses at random, calculate the following: (a) the probability that Bigg gets at least one right (b) the probability he gets two right (c) the probability Bigg gets them all right. Note: assume that Bigg must guess the value of each card before looking at any of the cards. Hint: What is the sample space? How are the probabilities assigned?
1-43
1-44
Complex Combinations
How many ways are there to create a full house (3-of-akind plus a pair) using a standard deck of 52 playing cards?
13 4 12 4 = 13i4i12i6 = 3, 744. 1 3 1 2
(choose denomination)x(choose 3 of 4 of given denomination)x(choose one of the remaining denominations)x(choose 2 of 4 of this second denomination).
This follows from the multiplication principle (Theorem 2.3.1 in text).
Applied Statistics: Probability 1-45
Conditional Probability
The conditional probability of A given
P( A B) , provided P ( B ) 0. P( A | B) = P( B)
a total of 8 when rolling two dice? Q2: Suppose you roll two dice that you cannot see. Someone tells you that the sum is greater than 6. What is the probability that the sum is 8?
1-47
Dice Problem
Let A be the event of getting 8 on the roll of two dice. Let B be the event that the sum of the two dice is greater than 6. The first question is Find P( A). Here is the sample space: (1,1) (1,2 )(1,3)(1, 4 )(1,5 ) (1, 6 ) (2,1) ( 2,2 )( 2,3)( 2, 4 )( 2,5 )( 2, 6 ) (3,1) ( 3,2 )( 3,3)( 3, 4 )( 3,5 )( 3, 6 ) (4,1) ( 4,2 )( 4,3)( 4, 4 )( 4,5 )( 4, 6 ) (5,1) ( 5,2 )( 5,3)( 5, 4 )( 5,5 )( 5, 6 ) (6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
Sum=8.
P( A) =
5 . 36
because we are assuming that each outcome pair has the same probability, 1/36.
1-48
(1, 6 ) ( 2,5)( 2, 6 ) ( 3, 4 )( 3,5)( 3, 6 ) ( 4,3)( 4, 4 )( 4,5)( 4, 6 ) ( 5,2 )( 5,3)( 5, 4 )( 5,5)( 5, 6 ) (6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
It follows that P( A | B ) =
5 . 21
Alternatively, by definition of conditional probability, we have P( A B ) P( A) P( A | B) = = because A B = A. P( B) P( B) Note well! 5 P ( A) 36 5 = = . Furthermore, 21 P( B) 21 Sothe definition makes sense! 36
1-49
Try this.
A university has 600 freshmen, 500 sophomores, and 400 juniors. 80 of the freshmen, 60 of the sophomores, and 50 of the juniors are Computer Science majors. For this problem assume there are NO seniors. What is the probability that a student, selected at random, is a freshman or a CS major (or both)? If a student is a CS major, what is the probability he/she is a sophomore?
1-50
1-51
Curtains
Suppose you are shown three curtains and told that a treasure chest is behind one of the curtains. It is equally likely to be behind curtain 1, curtain 2, or curtain 3; thus, the initial probabilties are 1/3, 1/3, 1/3 for the treasure being behind each curtain. Now suppose that Monte Hall, who knows where the treasure is, shows you that the treasure is not behind curtain number 2. The probabilities become , 0, right? If you were asked to choose a curtain at this point you would pick either curtain and hope for the best.
1 Initially we had P(1) = P(2) = P(3) = , where P(i) = probability that the prize 3 is behind curtain i. P (1 2) P (1) = P (2) P (2) 1 1 = 3 = = P(3 | 2). 2 2 3
Note: using the conditional probabilities you do not have to enumerate the conditional sample space.
Applied Statistics: Probability 1-52
Curtains Revisited
We change the game as follows. Again the treasure is hidden behind one of three curtains. At the beginning of the game you pick one of the curtains - say 2. Then Monte shows you that the treasure is NOT behind curtain 3. Now he offers you a chance to switch your choice to curtain 1 or to stay with your original choice of 2. Which should you do? Does the choice make a difference?
1-53
1 3
one of the other curtains is 2 3 . Monte will NEVER show you the treasure; thus, even after he shows you one of the curtains, the probability that the treasure is behind the curtain you did not choose is 2 3 . It is twice as likely to be behind the other curtain so... you should always switch!
Moral of the story: be REALLY careful about underlying assumptions about the sample space and how it changes to create the conditional sample space. You are generally assuming the changes are random. Maybe they are not.
1-54
Alternate Form
We have seen that the conditional probability of event A given that event A B) , provided P ( B ) 0. B has occurred is: P( A | B) = P(P ( B)
Clearly this implies that P( A B) = P( A | B) P( B). This is referred to as the "multiplication rule," and holds even when P ( B ) = 0. Notice that we could also write P( A B) = P( B | A) P( A). Both these equations always hold for any two events. But there is a special case where the conditional probabilities above are not needed.
Note: memorize these conditional probability equations TODAY. They are extremely important.
1-55
Independent Events
Two events A and B are independent iff
If one die is rolled twice, is the probability of getting a 3 on the first roll independent of the probability of getting a 3 on the second roll? Q2: If one die is rolled twice, is the probability that their sum is greater than 5 independent of the probability that the first roll produces a 1?
Applied Statistics: Probability
1-56
Dice: Q2
The probability of getting a 1 on the first die is 1 . Let G5 be the event that 6 the sum of the two dice is greater than 5 and F1 be the event that the first roll
F1 G 5
P [ F 1 G 5] =
2 1 = . 36 18 26 13 P [ G 5] = = . 36 18 1 P [ F1] = . 6
P [ F 1 G 5] =
Practice Quiz 1 Explain your work as you have been taught in class.
1. A university has 600 freshmen, 500 sophomores, and 400 juniors. 80 of the freshmen, 60 of the sophomores, and 50 of the juniors are Computer Science majors. For this problem assume there are NO seniors. If a student is a CS major, what is the probability that he/she is a Junior?
i
1 2. Evaluate . i =3 4
3. What is the probability of drawing 2 pairs in a draw of 5 cards from a standard deck of 52 cards? (A pair is two cards of the same denomination such as two aces, two sixes, or two kings.)
1-59
Multiplication Rule
*This slide from Applied Statistics and Probability for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability 1-60
*This slide from Applied Statistics and Probability for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger, John Wiley & Songs, Inc. 2006
Applied Statistics: Probability
1-61
*This slide from Applied Statistics and Probability for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability 1-62
Problem 2-97a
A batch of 25 injection-molded parts
contains 5 that have suffered excessive shrinkage. If two parts are selected at random, and without replacement, what is the probability that the second part selected is one with excessive shrinkage?
S={pairs
(f,s) of first-selected, secondselected taken from 25 total with 5 defects} SD={second selected (no replace) is a defect} FD={first selected is a defect} FN={first selected is not a defect}.
1-63
Problem Solution
We seek P[SD]=P[SD FD]+P[SD FN] This becomes P[ SD | FN ]P[ FN ] + P[ SD | FD]P[ FD]
5 4 4 5 1 1 1 = * + * = + = = 0.2. 24 5 24 25 6 30 5
1-64
*This slide from Applied Statistics and Probability for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability 1-65
In a particular production run 20% of the chips have high-level, 30% have medium-level, and 50% have low-level contamination. What is the probability that one of the resulting chips fails?
1-66
Bayes Rule
Suppose the events
P( A | B ) P( B )
i i i
provided P( A) 0.
1-67
Exercise:
Moon Systems, a manufacturer of scientific workstations, Produces its Model 13 System at sites A,B, and C, with 20%, 35%, and 45% produced at A,B, and C, respectively. The probability that a Model 13 System will be found Defective upon receipt by a customer is 0.01 if shipped From site A, 0.06 if from B, and 0.03 if from C. (a) What is the probability that a Model 13 System selected at random at a customer location will be defective? (b) If a customer finds a Model 13 to be defective, what is the probability that it was manufactured at site B?
1-68
Solution (a)
Let D be the event that a Model 13 is found to be defective at a customer site. We want P [ D ] = P [ D | A] P [ A] + P [ D | B ] P [ B ] + P [ D | C ] P [C ] , where A is the event that the Model 13 was shipped from plant A and the events B and C are defined analogously. This is a very important example of conditioning an event D on three events that partition D. An equation like this can always be written when a number of events partition an event in which we're interested. (The general form would be P [ D ] = P [ D | Ai ] P [ Ai ] where { Ai }i form a
i =1 n
partition of the event D.) Substituting the given numbers we have: P [ D ] = ( 0.01)( 0.2 ) + ( 0.06 )( 0.35 ) + ( 0.03)( 0.45) = 0.037, which answers (a).
1-69
Solution (b)
(b)
If a customer finds a Model 13 to be defective, what is the probability that it was manufactured at site B?
P( D | B) P( B) P( D | A) P( A) + P ( D | B ) P ( B ) + P( D | C ) P(C ) by Bayes Law. Substituting we get (0.06) (0.35) P( B | D) = (0.01) (0.2) + (0.06) (0.35) + (0.03) (0.45) = 0.575 Now we seek P ( B | D ) =
1-70
Bernoulli Trials
Consider an experiment that has two possible outcomes,
success and failure. Let the probability of success be p and the probability of failure be q where p+q=1. Now consider the compound experiment consisting of a sequence of n independent repetitions of this experiment. Such a sequence is known as a sequence of Bernoulli Trials. The probability of obtaining exactly k successes in a sequence of n Bernoulli trials is the binomial probability
k nk p(k ) = ( n p q . k)
When we take a discrete or countable sample space = {s1 , s2 ,...} and assign probabilities to each of the possible simple events: P({s1}) = p1 , P({s2 }) = p2 ,..., we have created a probability distribution. (Think that you have "distributed" all of the probability over all possible events.) As an example, if I toss a coin 1 1 one time then P( H ) = and P (T ) = represents a probability distribution. 2 2 The single coin toss distribution also is an example of a Bernoulli trial because it has only two possible outcomes (generally called "success" or "failure).
Probability Distribution
1-72
p is the probability of success and q is the probability of failure in n Bernoulli trials. Suppose we toss a coin 10 times and we want the total number of heads. Then p = P [ H ] , q = P [T ] , n = 10. Using the above formula we obtain the probabilities:
3.00E-01 2.50E-01 2.00E-01 1.50E-01 1.00E-01 5.00E-02 0.00E+00 0 1 2 3 4 5 6 7 8 9 10 probabilities
1-73
Regarding Parameters
Notice that the binomial distribution is completely defined by the formula
k nk for its probabilities, p (k ) = ( n , and by it "parameters" p and n. k) p q
The binomial probability equation never changes so we regard a binomial distribution as being defined by its parameters. This is typical of all probability distributions (using their own parameters, of course). One of the problems we often face in statistics is estimating the parameters after collecting data that we know (or believe) comes from a particular probability distribution (such as the p and n for the binomial). Alternatively, we may choose to estimate "statistics" such as mean and variance that are functions of these parameters. We'll get to this, after we consider random variables and the the continuous sample space.
1-74
*Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2nd Ed, Kishor S. Trivedi, J. Wiley & Sons, NY 2002
Applied Statistics: Probability 1-75
Example
A communications network is being shared by 100 workstations. Time is divided into intervals that are 100 ms long. One and only one workstation may transmit during one of these time intervals. When a workstation is ready to transmit, it will wait until the beginning of the next 100ms time interval before attempting to transmit. If more than one workstation is ready at that moment, a collision occurs; and each of the k ready workstations waits a random amount of time before trying again. If k = 1, then transmission is successful. Suppose the probability of a workstation being ready to transmit is p. Show how probability of collision varies as p varies between 0 and 0.1.
1-76
Practice Quiz 2
A partial deck of playing cards (fewer than 52 cards) contains some spades, hearts, diamonds, and clubs (NOT 13 of each suit). If a card is drawn at random, then the probability that it is a spade is 0.2. We write this as P[Spade]=0.2. Similarly, P[Heart]=0.3, P[Diamond]=0.25, P[Club]=0.25. Each of the 4 suits has some number of face cards (King, Queen, Jack). If the drawn card is a spade, the probability is 0.25 that it is a face card. If it is a heart, the probability is 0.25 that it is a face card. If it is a diamond, the probability is 0.2 that it is a face card. If it is a club, the probability is 0.1, that it is a face card.
1. 2. 3.
What is the probability that the randomly drawn card is a face card? What is the probability that the card is a Heart and a face card? If the card is a face card, what is the probability that it is a spade?
1-77
1-78
Review of function
Defn: A function is a set of ordered pairs such that no two pairs have the same first element (unless they also have the same second element). Example: g = (1,2 ) , 3, 5 , ( 5,12 ) defines a function, g , whose "domain" consists of the real numbers 1,3,5 and whose "range" consists of the numbers 2, 5,12. All functions are said to "map" values in their domain to values in their range. Example: f ( x) = x 2 + 5. Here a function is defined using a formula. This actually implies the the function is f =
{( x, x
+ 5 ) : x is a real number .
Notice the following: (a) The function has a "name." Here that name is f . (b) The implied domain of the function includes all real numbers, x, that can be plugged into the formula. In this case that includes all real no's. (c) Every number x in the domain (all reals) is "mapped" to the number x 2 +5. Thus f (1) = 6, f (5) = 30, f ( ) = 2 +5. (d) Sometimes we write this as 1 6, -5 30, 2 +5. (e) The range of f is { x : x 5} .
Applied Statistics: Probability 1-79
Random Variable
Definition: A random variable
sample space is a function that assigns a real number x to each sample point s . The inverse image of X ( s ) is the set of all points in that the random variable X maps to the value x . It is denoted Ax = {s | X ( s ) = x}.
discrete X discrete continuous X continuous
X on a
1-80
Random Variable
= {s1 , s2 , s3 ,...}
= ( , )
p X ( x) = P [ Ax ] .
OR we may be given a discrete (continuous later) random variable, a description of the values it can produce and the probability of each value. For example, For k = 1, 2,..., n, P [ X = k ] = pk . In this case we need not know what the underlying experiment really is.
Applied Statistics: Probability 1-81
The importance of the random variable is that it lets us deal with such an experimental setup without thinking dice, arrows, or cards. We say: Let X be a random variable such that takes on the discrete values 1,2,3,4,5,6. Its probability mass function is given as: the probability that X=i is 1/6. We write this as
p X (i ) =
1 for i=1,2,...,6. 6
1-82
P ( s ).
The pmf satisfies the following properties: (p1) 0 p X ( x ) 1 for all values, x , such that P[ X = x ] is defined. (p2) If X is a discrete random var iable then
p
i
1-83
MathCad definition
Note that MathCad refers to the probability mass function as the probability density function (pmf versus pdf). This is the reason that the included pmfs begin with the letter d, like dbinom. We will reserve pdf for use with continuous distributions. However, you must interpret pdf as pmf when using MathCad. The text uses probability mass function, but represents a pmf with a function, f, instead of our notation, p. (See page 70.)
Applied Statistics: Probability 1-84
Discrete RV Example 1
Let the sample space represent all possible outcomes of a roll of one die; thus, = {1, 2,3, 4,5, 6} . We define the random variable X on this sample space as 1 follows: X (i ) = 0 if i = 1, 2 . Because the probability of rolling a 1 or 2 if i = 3, 4, 5, 6 if i = 1 . X has if i = 0 1 and 3
a Bernoulli distribution. Alternatively, we could just write that p X (1) = 2 p X ( 0 ) = , or we could define the pmf using a table: 3
Discrete RV Example 2
A die is tossed until the occurrence of the first 6. Let the random variable X = k if the first 6 occurs on the kth roll for integer k > 0. What is the probability mass function (pmf) for X ? In order for the first 6 to occur on the 5th toss, for example, we must have the event AAAA6 occur where A means any result other than 6. Clearly, these represent a sequence of 5 Bernoulli trials where success = 6 and failure = 1 through 5. Each trial is independent; thus, the probability 5 1 625 of this particular result is = = 0.08. Similarly, the probability 6 6 7776 5 of the first 6 on the kth roll is 6 5 p X (k ) = 6
k 1 k 1 4
nk
, k = 0,1,...n.
(3) Roll a die once, twice, ... until you get, say, a 2 for the first time. Suppose that the first 5 time you get the 2 is on the kth roll. The probability of this is 6 5 distribution for X : p X ( k ) = 6
k 1 k 1
1 . The geometric 6
1 , k = 1, 2,.... 6
1-87
If A = (a, b), we write: P( X A) = P(a<X <b). If A = (a, b], we write: P( X A) = P(a<X b), etc.
For any real number x the probability that the random variable X takes a value in the interval (, x] is especially important and is denoted as: FX (x) = P( < X x) = P( X x) = pX (t), where the last equality holds
t x
only for discrete RVs X . The function F is called the cummulative distribution function (or just the distribution function) of X .
Applied Statistics: Probability
1-88
1-89
(F4) For discrete X that has positive probability only at the values x1, x2... F has a positive jump at xi equal to pX (xi ) and takes a constant value in the interval [xi1, xi ). Thus, it graphs a step function. Cumulative distribution functions of discrete RVs grow only by jumps, and cumulative distribution functions of continuous RVs have no jumps. A RV is said to be of mixed type if it has continuous intervals plus jumps.
Applied Statistics: Probability 1-90
Bernoulli Distribution
The RV, X, is Bernoulli (or has a Bernoulli distribution) if its pmf is given by p0 = pX (0) = q and p1 = pX (1) = p where p + q =1. The corresponding CDF is given by:
F(x) =
1 for x1.
Example: Roll a die once. Let X=1 if the result is 1 or 2. Let X=0 otherwise. This is a Bernoulli trial with p=1/3 and q=2/3.
1-91
Bernoulli pmf
Bernoulli Distribution p=0.5 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 Bernoulli Distribution p=0.5
1-92
0 for x < 0 Write as: F ( x ) = 0.5 for 0 x < 1 1 otherwise. Notice that the cdf is defined for all real numbers, x.
Applied Statistics: Probability 1-93
If we let X take on the integer values 1,2,...,n, then its distribution function is given by 0 for x < 1 x x 1 FX ( x) = = for 1 x n n i =1 n 1 for x > n.
Applied Statistics: Probability 1-94
1-95
1-96
Binomial Distribution
Let Yn denote the number of successes in n Bernoulli trials. The pmf of Yn is given by: pk = P(Yn = k ) = PYn (k ) =
distribution if 0 for t < 0 t n i P[Yn t ] = FYn (t ) = ( i ) p (1 p)ni for 0 t n Example: Toss a coin 10 i =0 times and count the total 1 for t > 0. number of heads. This is
binomial with p=0.5.
Applied Statistics: Probability 1-97
( )p
n k
(1 p )nk
for 0 k n , k an integer,
otherwise.
1-98
Geometric Distribution
Consider any arbitrary sequence of Bernoulli trials and let Z be the number of trials up to and including the first success. Z is said to have a geometric distribution with pmf given by pZ (i ) = q i 1 p for i = 1, 2,.... and probabilities p + q = 1 This is p well-defined because pq = = 1. 1 q i =1 The distribution function of Z is given by
i-1
Example: See the previous example concerning rolling 1 die until a 6 occurs.
Applied Statistics: Probability 1-100
A
an1 an2 an(mn) An
P ( aij | A ) = P (aij | Ai ) P ( Ai | A )
Proof:
a1(m1) a 2(m2)
P ( aij | A ) =
P ( aij A ) P ( A)
P ( aij ) P ( A) =
P ( aij ) P ( Ai ) P ( Ai ) P ( A )
P ( aij Ai ) P ( Ai A ) P ( Ai ) P ( A )
Poisson Distribution
A random variable, X t , has a Poisson Distribution with parameter >0 if its pmf is given by: ( t ) k (e t ) P( X t = k ) = for k = 0,1,.... and t 0. (A distinct RV for each t.) k! NOTE: The Poisson is typically used to model the number of jobs arriving during time t in a time-share system, the arrival of calls at a switchboard, the arrival of messages at a terminal, etc. The parameter is then interpreted as an arrival rate "per unit time." That is, if t is in seconds, then must be the average arrivals per second. The cumulative distribution function is: 0 for x < 0 x FX t ( x ) = ( t ) k (e t ) for x 0. k! k =0 Notice that in mathematical notation does not typically appear on the left-hand side even though the function is unspecified without it.
Applied Statistics: Probability 1-104
X1 X2
XN
XN
If Yt represents the total number of arrivals during any time t , then ( t ) k (e t ) P (Yt = k ) = , per the previous slide. k!
Applied Statistics: Probability 1-105
Poisson Example
Connections arrive at a switch at a rate of 11 per ms. The arrival distribution is Poisson. (a) What is the probability that exactly 11 calls arrive in one ms? (b) What is the probability that exactly 100 calls arrive in 10 ms? (c) What is the probability that the number of calls arriving in 2 ms is greater that 7 and less than or equal to 10?
Let X t be the random variable giving the number of arrivals during t ms. We know that X t has a Poisson distribution, which implies that P [ X t = k ] = arrival rate is 11 ; thus, P [ X t = k ] =
( t )
(11t )
ms
k!
( e ) with t in ms.
11t
k!
(e ) .
t
The
(a) Probability of exactly 11 arrivals in one ms is P [ X 1 = 11] = (b) Probability of 100 calls in 10 ms is P [ X 10 = 100] = (c) P [ 7 < X 2 10] =
k =8 10
(11)
100
11
(1110 )
(e
11!
( e ) = 0.119.
11
1110
(11 2 )
(e
11 2
k!
) = 22
100!
) = 0.025.
e 22
Be sure to specify all possible values of k , such as "for integers k = 1, 2,..., n." Be sure to use the values of other parameters (p, n, ...) that are correct for this particular problem. To define a cdf write the following: 0 x "The cdf is: FX ( x ) = ( the expression for p X ( k ) ) k =1 1
This means the probability X x.
"
Practice Quiz 3
A college student phones his girlfriend once each night for three nights. The 1 anytime he calls. Suppose that the random 3 variable X equals the number of nights (out of three) that he is able to reach her. 1. What is the pmf for X ? What is the name of X ' s distribution? probability that he reaches her is 2. What is the cdf for X ?
3. Now suppose that this student will phone once each night until the first night that he is able to reach his girlfriend. Let Y be a random variable that equals the number of nights that it takes him to reach her for the first time. (For example, Y = 2 if she doesn't answer the first night but does answer the second night.) What is the pmf for Y ? What is the name of Y ' s distribution?
x
1 2
p X ( x)
1 6
1 3
1 2
Applied Statistics: Probability 1-111
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
0.8
0.6 FX ( x) 0.4
0.2
5 x
10
Figure 3-5 A probability distribution can be viewed as a loading with the mean equal to the balance point. Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger variance.
Applied Statistics: Probability 1-118
Figure 3-6 The probability distribution illustrated in Parts (a) and (b) differ even though they have equal means and equal variances.
Applied Statistics: Probability 1-119
Example 3-11
Continuing the example from previous slide: E (5 X ) = 5(12.5) = 62.5 Var (5 X ) = 25(1.85) = 46.25
Applied Statistics: Probability 1-121
n m +1 n n 1 m n 1 m m +1 = (m + 1) = + (1 ) ( 1) (1 ) p p m p p ( m + 1)! m =0 m =0 m + 1
n 1 n 1
m +1
(n 1) m m = np p (1 p ) n 1 m = np. m! m=0 Similar work will show that Var(X ) = np(1 p), but there are much easier ways to show this.
n 1
Exercise
The interactive computer system at Gnu
Glue has 20 communication lines to the central computer system. The lines operate independently and the probability that any particular line is in use is 0.6. What is the probability that 10 or more lines are in use? What is the expected number of lines in use? What is the standard deviation of lines in use?
Binomial Revisited
Recall that the Binomial RV X = X 1 + X 2 + ... + X n where each X i has a Bernoulli distribution and is mutually independent with the others. Because E ( X i ) = p, it follows trivially that E ( X ) = np. Because of independence we can write that Var ( X ) = Var ( X i ) also. Var ( X i ) = E ( X i2 ) E 2 ( X ) = p p 2 = p(1 p ).
i =1 n
k
k =0
k e
k!
= e
k =1
k 1
(k 1)!
= e e = . = e k
k =1
Similarly, E ( X 2 ) = k
k =0
k e 2
k 1
(k 1)! +
k!
k =1
= e [ (k 1) = 2e
k =2
k 1
k 1
k 2
(k 2)!
k 1
(k 1)!
= 2 + .
It follows that Var(X ) = E ( X 2 ) E 2 ( X ) = 2 + 2 = . NOTE : The mean and variance of the Poisson random variable, X t is t.
Exercise
Suppose it has been determined that the
number of inquiries that arrive per second at the central computer system can be described by a Poisson random variable with an average rate of 10 messages per second. What is the probability that no inquiries arrive in a 1-second period? What is the probability that 15 or fewer inquiries arrive in a 1-second period? What are the mean and variance of the number of arrivals in 1 second?
Applied Statistics: Probability 1-127
= p k (1 p ) k 1.
k =1 k
1 1 k 1 Write s ( p ) = (1 p ) = . Then s ( p) = k (1 p ) = 2 . p p k =0 k =1
11 and 2
Hypergeometric Distribution
Suppose that a set of n objects includes k objects of type 1 (successes?) and n k objects of type 0 (failures perhaps?). A sample of size m is selected from the n objects "without replacement," where m n (and k n). Let X be the random variable that denotes the number of type 1 objects in the sample. Then X is said to be a hypergeometric random variable and its pdf is given by: k n k i m i for i = max 0, m + k n to min k , m { } { } . pX ( i ) = n m 0 otherwise Values of i (examples): (1) n = 20, k = 5 (type 1), m = 3 (sample size) i = 0,1,...,3 (2) Same as (1) but m = 7 i = 0,1,...5 (3) Same as (1) but m = 17 i = 2,3...,5.
Applied Statistics: Probability 1-130
2 = var( X ) = mp (1 p )
objects in the total).
240 Example: In the previous problem E ( X ) = 10 = 3 and 800 240 240 800 10 Var(X ) = 10 1 = 2.076. 800 800 799
A continuous random variable is characterized by a distribution function that is a continuous function of x for all x . If the distribution function has a derivative at all except, possibly, a finite number of points, then the random variable is said to be absolutely continuous. Example: 0, x < 0 FX ( x) = x, 0 x < 1 1, x 1. This means the
probability X x.
Applied Statistics: Probability 1-135
Properties of CDF
* 0 F ( x) 1, < x < * F ( x) is an increasing function of x. *
f (t )dt ,
< x < .
(2)
f ( x)dx = 1.
Example
The probability density function f is given as:
k x2 for x > 2 f ( x) = 0 otherwise. What is the value of k ? What is the corresponding cdf?
f (t )dt. It also
Exponential Distribution
A random variable has an exponential distribution if for some >0 its distribution function is given by: 1 e x , if 0 x < F ( x) = otherwise. 0 It follows that its pdf is given by: e x , if x 0 f ( x) = otherwise. 0
Note that in most problems the parameter represents a "rate," such as a rate of arrivals or a rate of failures.
Examples of use: Interarrival times at a communication switch Service times at a server Time to failure or repair of a component.
Applied Statistics: Probability 1-140
Exponential pdf = 2
e x , if x 0 f ( x) = otherwise. 0
Note the values of pdf mean nothing except through integration.
Applied Statistics: Probability 1-141
Exponential cdf
=2
1 e x , if 0 x < F ( x) = otherwise. 0
Applied Statistics: Probability 1-142
Class Problem
Suppose that we stand at a mile marker on I-4 and watch cars pass. We notice that on the average 10 cars pass by us per minute and we're given that the time lapse between two consecutive cars has an exponential distribution. If we begin timing at the moment that one car passes by, what is the probability that we will have to wait more than 20 secs for the next car to pass?
Note 20 sec = 1/3 min.
1 1 Answer. Let W be waiting time in minutes. We seek P W > = 1 F ( ), 3 3 where F (t ) = 1 e t , is the exponential cdf. The average "rate" is =10; 1 10 thus, the answer is P W > = e 3 = 0.036 3
Simple Exercises
Use F in the previous problem to write: Probability W<6 Probability W>6 Probability W<0 Probability W<-1 Probability 2<W<5 Probability W=1.
Memoryless Property
If X has an exponential distribution, x > 0, and t > 0, then we know that P( X x) = e
0 x y t+x
dy and P ( X t + x) =
e y dy.
t+x
e y dy
.
y e dy
e t (1 e x ) x = = 1 e = P( X x). t e This is why we don't replace lightbulbs until they fail. (Would you like it if waiting time at your doctor's office was exponentially distributed?)
Exponential/Poisson Relationship
Show that the time between adjacent arrivals of a Poisson Process has an exponential distribution.
Hint: If N t denotes the number of arrivals during time t and N t has a Poisson distribution, then the probability that waiting time to the next event is greater than t is P[W > t ], where W is waiting time, and 0 arrivals during time t. P[W > t ] = P[ N t = 0].
| t | |
Continuing
We've seen that P (W > t ) = P ( N t = 0 ) , and we know that P ( N t = k ) = Thus, P (W > t ) =
k e
k!
0e
0! exponential distribution.
Using integration by parts one can show ( ) = ( 1) ( 1) for > 1. Because (1) = 1, it follows that (n) = (n -1)(n -1) = ... = (n -1)! when n is a positive integer. Note also that ( 1 2 ) = . Also, it is well known that ( )
x e
-1 x
dx =
Example
Evaluate
x 2 e 4 x dx.
1 x
dx =
( )
xe
2 4 x
( 3) 2! 1 = . dx = 3 = 4 64 32
Gamma Distribution
A random variable with pdf given by
t 1e t f (t ) = , > 0, t > 0, > 0 ( )
Is said to have a Gamma distribution with parameters and and we write X GAM( , ). The parameter is called the shape parameter and the parameter is called the scale parameter. For =1 the gamma becomes
identical to the exponential distribution. NOTE: If a sequence of random variables X 1 , X 2 ,..., X k are mutually independent and identically distributed as GAM( , ), then their sum has a GAM( ,k ) distribution.
Applied Statistics: Probability 1-150
Gamma Density:
g ( x, , )
scale, shape
Practice Quiz 4
The pdf of a random variable, X , is given by: for x < 0 0 3 x fX ( x) = for 0 x 4 64 for x > 4. 0 Answer each of the following and EXPLAIN (show your work). (a ) What is the cdf of X ? (Find it explicitly.) (b) What is P ( X > 2 ) ? (c) What is P( X > 2 | X > 1) ? BONUS: Use the gamma integration formula to evaluate the following integral:
3 2 x x e dx 0
x3 x 5 4 1024 16 E ( X ) = xf X ( x)dx = xi dx = = = . 64 5 64 0 5 64 5 0
3 6 4 4096 32 x x 2 2 2 E ( X ) = x f X ( x) dx = x i dx = = = 64 6 64 0 6 64 3 0 4
32 16 Var ( X ) = E ( X ) E ( X ) = 3 5 = 0.427
2 2
Applied Statistics: Probability 1-154
Existence of E(X)
Continuing a previous example: Let X be a random variable with pdf given by:
2 x2 for x > 2 f ( x) = 0 otherwise.
2 2 Notice that E ( X ) = x 2 dx = dx = 2 ln x 2 = . x x 2 2 Thus, well-defined random variables may not have finite means. Similarly, a random variable may have a finite mean and not have a finite 2nd moment or a finite kth moment (to be defined).
xf ( x)dx.
x 2 f ( x)dx.
2 x ( 5 x 2) f ( x)dx.
(b) E ( X 2 + 5 X 2) =
(c) E (sin X ) =
( sin x ) f ( x)dx.
2 ( x ) f ( x)dx.
(d) 2 = Var ( X ) = E ( X ) 2 =
Variance of X
and Var( X ) =
1 x x e dx =
0
( )
Then E ( X ) =
xf
( x)dx = x e
0 2 2
dx =
(2)
=
2
.
x
dx =
(3)
Thus, Var(X ) =
Exponential Example
The lifetime of a particular brand of lightbulb is exponentially distributed. The "mean time to failure" (MTTF) is 2000 hours. If the lightbulb is installed at time t = 0, what is the probability that it fails at time t = 3000 hours? What is the probability that it fails in fewer than 3000 hours?
Think scale**
The variance is 2 .
Note that when =1, the distribution is exponential.
Class Exercise
A gamma distribution has a mean of 1.5 and a variance of 0.75. Sketch the pdf.
= 1.5 and 2 = 0.75. First, solve for and . From the first 1.5 = 0.75. equation we get =1.5 . Substitute this in second equation to get 2 This gives =2 which implies =3. This is all we need to obtain pdf values.
We have
0.596
Gam( x) 0.5
Important: Make sure that you can find the scale and shape parameters from a given mean and variance.
0 0
1 x
3 3
Applied Statistics: Probability 1-162
2 2 1 1 ( )( = E ( X ) = x b dx b a a 2 a
n +1 n2 1 E( X ) = and Var (X ) = . 2 12
Applied Statistics: Probability 1-166
Example:
Suppose that the time that it takes to drive from the Orlando airport to FIT is uniformly distributed between one hour and one and a quarter hours. (a) What is the mean driving time? (b) What is the standard deviation of the driving time? (c) What is the probability that it will take less than one hour and 5 minutes to make the trip? (d) 80% of the time the trip will take less than ______ minutes?
E(X)
Var(X)
p (1 p )
np
p n +1 2 1
np (1 p )
(1 p )
p2
n2 1 12
(or )
1
(or )
1
a+b 2
2 2 2 (b a ) 12
Practice Quiz 5
1. Suppose that the driving time between Orlando and Melbourne is uniformly distributed between one hour and one hour plus 15 minutes. (a) What is the variance of the driving time.? (b) Eighty percent of all drivers make the trip in fewer than x minutes. What is x ? 2. The random variable X has an exponential distribution. What is the pdf of X ? What is the mean of X ? What is the variance of X ? (Just write down what we known the mean and variance are. Do not derive them from the definition.) 3. The random variable X has a Binomial distribution and the total number of trials is 30. (a) If the probability of success on a single trial is 0.2, write the pmf for X . (b) If E ( X ) = 2.1, then what is the probability of success on a single trial?
Normal Distribution
Definition
Normal Distribution
Figure 4-10 Normal probability density functions for selected values of the parameters and 2.
Applied Statistics: Probability 1-171
Normal Distribution
Some useful results concerning the normal distribution
Normal Distribution
Definition : Standard Normal
Normal Distribution
Example 4-11
I I
Normal Distribution
Standardizing
Normal Distribution
To Calculate Probability
Normal Distribution
Example 4-13
Normal Distribution
Example 4-14
Normal Distribution
Example 4-14 (continued)
In Mathcad
We want to find the value of x such that P [ X x ] = 0.98. We use the inverse normal function in MathCad. x := qnorm( p, , ).
x = 14.107
Students should be able to solve problems like this and the one on slide 182 either by using MathCad or referring to a standard normal table.
Normal Distribution
Example 4-14 (continued)
Figure 4-16 Determining the value of x to meet a specified probability. Applied Statistics: Probability
1-182
Figure 4-26 Weibull probability density functions for selected values of and .
Using integration by parts one can show ( ) = ( 1) ( 1) for > 1. Because (1) = 1, it follows that (n) = (n-1)(n-1) = ... = (n-1)! Note also that (
1 2
) = .
x e
-1 x
dx =
( )
Figure 4-27 Lognormal probability density functions with = 0 for selected values of 2.
Applied Statistics: Probability 1-189
=1-
Simple example
The joint distribution of ( X , Y ) is given as follows: 1 p X ,Y (1,1) = , 3 1 1 p X ,Y ( 2,1) = and p X ,Y ( 2, 2 ) = 6 6 1 1 1 p X ,Y ( 3,1) = , p X ,Y ( 3, 2 ) = , and p X ,Y ( 3,3) = . 9 9 9
Do the values of X and Y seem to "influence" each other?
Marginal pmf
If X 1 , X 2 ,..., X n are discrete random variables with joint pmf p X1 , X 2 ,..., X n ( x1 , x2 ,..., xn ) , then p X i ( xi ) =
[ X i = xi ]
range of ( X 1 , X 2 ,..., X n ) where X i = xi . The function p X i ( xi ) is called the marginal probability mass function for X i . Example: Using the previous slide the marginal pmf for X 1 is given by: 1 p X1 (1) = 3 1 p X1 ( 2 ) = 2 1 p X1 ( 3 ) = . 6 1 3
The notation above is difficult; however, notice that, in the example, to get p X1 (1) = you just sum over all the ( x1 , x2 ) pair values that have 1 as the value for X 1.
Problem:
Joint pmf for X 1 and X 2
X2 =1
X1 = 1 X1 = 2
X2 = 2
X2 = 3
1/12 1/12 0
X1 = 3
Find:
The discrete random variables X 1 , X 2 ,..., X n are said to be mutually independent if their joint pmf can be written as: p X ( x ) = p X1 ( x1 ) p X 2 ( x2 )iii p X n ( xn ).
In the previous example are X 1 and X 2 independent?
Applied Statistics: Probability 1-198
Double Integral:
z z=f(x,y)
f ( x, y)dA
R
b n( x)
a m( x)
f ( x, y )dydx
f(x,y)
Y=m(x) x=a
x,y
x=b
Example:
Evaluate the double integral
x 6 2 32 = xx dx = | = . 6 0 3 0
4
2 0 0 0 0
Region R
1 x 2 3 3
Computes the volume under the surface z=2xy and above the region R.
x=2
6 4 2 0 0 0 0 1 x 2 3 3
2 x
Extra Practice
Evaluate the double integrals: 1. 2. 3. 4. 5. 6.
2 2 x 3 y dxdy for R bounded by y = x, y = 2 x, x = 1. R 3 4 3 x y dxdy R y x 10 for bounded by , y = 0, x = 1. = R 2 2 2 y dxdy R y x x y for bounded by , 2 . = = R
R R
1 1 ,y= . x x
Problem 3
5 4
y1 ( x)3 y2 ( x) 2
y2 = 2 x
0 2
0 x
Iterated integral:
1.353 x 2
2 x
y dydx
lim
y
FX ,Y ( x, y ) = FX ( x) is called the marginal cumulative distribution of X . FX ,Y ( x, y ) = FY ( y ) is called the marginal cumulative distribution of Y.
lim
x
Think of FX ( x ) = P [ X x ] = P [ X x Y < ] FX ,Y ( x, ) . Also: The marginal cdf is defined in the same manner for discrete random variables.
is fY ( y ) =
f ( x, y ) dx.
f X ( u ) du =
f (u , v)dvdu f (u , v)dudv.
Applied Statistics: Probability 1-207
f ( v ) dv =
Y
Mathcad Example
y = ( x 1)
3 4
P [ X 2, Y 2] =
1
3 2 4 ( x 1)
1 6
dydx
A
x =1 x=5
=1 6 y|
1
3 ( x 1) 4
1 dx = 8 ( x 1)dx 1
2 2 1 x 1 = 8 x | = . 2 1 16
f X ,Y ( x, y ) dy, for 1 x 5, (The value of x is held constant, and and the value of y is "integrated out.")
3 ( x 1) 4
1 6
x 1) y 3 1 4( dy = | = ( x 1) . 0 6 8
Notice that
2 5 x 1 1 f X ( x ) dx = 8 ( x 1) dx = 8 x | 2 1 1 1 25 =1 8 ( 2 5 ) ( 2 1) = 1 5
Example:
The random variables X and Y have the joint density function
2 2 1 ( x + y ) for x > 0 and y > 0 xy exp 2 f ( x, y ) = otherwise. 0 Find FY ( y ) , f X ( x ) , and F (1, 2).
ye
y2 2
dy = xe
x2 2
By symmetry, fY ( y ) = ye
y2 2
= 1 e
y2 2
Finding F(1,2)
Note that in the region of integration, x>0 & y>0, values of y do not depend on values of x.
F (1, 2) FX ,Y (1, 2) = = xe
0 1 x2 2 2
0 0
2 2 1 + xy exp ( x y ) 2 dydx 1 x2 2
dx 0 ye dydx = 0 xe 0 x2 x2 1 = (1 e 2 ) xe 2 dx = (1 e 2 ) e 2 0
y2 2 2 1 = (1 e ) 1 e 2 0.340. 2
y2 e 2
Lessons to Learn
F(1,2)=P(X<1,Y<2) Integrate a pdf to get a cdf: FX ( x) =
f X (t )dt.
lim
FX ,Y ( x, y ) = FX ( x)
f X ,Y ( x, y )dy.
Applied Statistics: Probability 1-214
x := 0 y := 0 := 0
x := 1 y := 1
= 0.6
c :=
1 2 x y 1
2
Independent RVs
Two random variables, X and Y , are independent if FX ,Y ( x, y ) = FX ( x) FY ( y ) for < x < and < y < . If the corresponding density functions exist, this is equivalent to f X ( x, y ) = f X ( x) fY ( y ) for < x < and < y < . EXAMPLE: Let X and Y have joint pdf f ( x, y ) =
1 , x 2 + y 2 1
0, otherwise
1 , x 2 + y 2 1
0, otherwise
f X ( x) =
1-x 2
f ( x, y )dy =
- 1-x 2
dy =
- 1-x 2
1-x 2 , 1 x 1.
X
i =1
N ( , )
X
i =1
X
i =1
Erlang. A sum of independent gammas is gamma. A large sum of independent, identically distributed RVs is approximately normal.
Erlang Distribution
When r sequential phases have identical exponential distributions, then the resulting density is known as an r -stage Erlang and is given by: (r 1)! The distribution function is given by:
r 1
f (t ) =
r t r 1e t
( t ) k t F (t ) = 1 e t 0, > 0, r = 1, 2,.... k! k =0 Note that the exponential is a special case of the Erlang distribution with r = 1. One may also think of this as the distribution function for the sum of r mutually independent random variables each with the exponential distribution with parameter .
Sums of r exponentials
Suppose that the waiting time between arrivals of customers at the post office is exponentially distribution with parameter (the arrival rate per unit time). Let Wr be the waiting time until the rth arrival occurs. Then P [Wr t ] = P [ r or more arrivals occur during time t ]
r 1
( t )
e t ; k!
k
Example
Suppose that packets arrive at a switch with a Poisson distribution that has a rate of 100 packets per second. Starting at any particular point in time let T be the elapsed time until the arrival of the 5th packet. What is the probability that 5ms T 10ms? Solution. Because arrivals are Poisson, the waiting time between arrivals is 100 pkts sec exponential with parameter =0.1. The distribution of T will be Erlang ms 1000 sec and given by: (0.1t ) k 0.1t F (t ) = 1 e . Thus, answer is F (10) F (5). k! k =0
4
Discrete Example
The joint pmf for X and Y is given in the table.
Y=1 X=1 X=2 X=3 1/12 1/6 1/12 Y=2 1/6 1/4 1/12 Y=3 1/12 1/12 0
Find E ( X 2 + Y ) .
Continuous Example
1 6 for (x, y ) A Let f ( x, y ) = where A is the triangle shown 0 otherwise y =3
y =
3 4
( x 1)
A
x =1 x =5
Let ( X , Y ) = X + Y . What is E ( ( X , Y ) ) ?
2
Solution: E ( ( X , Y ) ) =
1
3 5 4 ( x 1)
2 1 x + y ( ) 6 dydx = 92.
Mathcad Solution
Linearity of Expectation
Suppose that X and Y are random variables and that a and b are any two real numbers, then E (aX + bY ) = aE ( X ) + bE (Y ). In previous example find E (3 X + 2Y ).
Examples
Given E ( X ) = 5 and Var ( X ) = 30, find E ( X 2 ). Ans: Var ( X ) = E ( X 2 ) [ E ( X )] E ( X 2 ) = 30 + 25 = 55.
2
Let Y = 2 X + 3. What is Y ? Ans: Var (Y ) = Var (2 X + 3) = Var (2 X ) + Var (3) = 4Var ( x) + 0 = 4 55 = 220. Thus, Y = 220.
Because a random variable is always independent of a constant. Because the variance of a constant is always 0.
where the i are real-valued functions for i = 1, 2,..., n. Using this property it is easy to show that if X and Y are independent, then Var ( X + Y ) = Var ( X ) + Var (Y ).
Example: If X and Y are independent random variables with E ( X ) = 5 and E (Y ) = 3, then E ( XY ) = E ( X ) E (Y ) = 15.
Equality holds because X and Y are independent!
Covariance of X and Y
We define Cov ( X , Y ) = E[( X E ( X ))(Y E (Y )].
2 Recall the definition that Var(X ) = E ( X X ) , where X = E ( X ). We could also write Var(X ) = E[( X E ( X ))( X E ( X ))] = Cov( X , X ).
In other words, covariance is a generalization of variance to multiple random variables. For covariance we also have the working formula Cov( X , Y ) = E ( XY ) E ( X ) E (Y ). Note that when X and Y are independent, Cov( X , Y ) = 0; however, knowing that Cov( X , Y ) = 0 is not enough to conclude that X and Y are independent. When Cov( X , Y ) = 0, we say that X and Y are uncorrelated. When X and Y are NOT independent we have: Var ( X + Y ) = Var ( X ) + Var (Y ) + 2Cov( X , Y ).
Applied Statistics: Probability 1-235
indicates that X large suggests Y large, as expected. If we change the original table: Probabilities: Y = 1 Y = 2 X =1 X =2 0.1 0.4 0.4 . Then X tends to be small when Y is large, etc. 0.1
In this second case, the covariance is 0.15. The absolute value of 0.15 doesn't mean much unless we're comparing with other covariance values or unless we "normalize" using the variance of X and Y .
Applied Statistics: Probability 1-236
Correlation Coefficient
The correlation (coefficient) of X and Y , ( X , Y ), is defined as:
( X ,Y ) =
cov ( X , Y )
XY
provided both variances are nonzero. We know that ( X , Y ) 1 and that ( X , Y ) = 1 iff P [Y = aX + b ] = 1 for some a and b. In the previous example, Var(X ) = .25 and Var (Y ) = .25; thus, 0.15 = 0.6 (on a scale that runs between -1 and +1). ( X ,Y ) = 0.25
Applied Statistics: Probability 1-237
Student Exercise
The random variables X and Y are jointly distributed as follows: 1 1 1 PX ,Y (1, 2) = PX ,Y (2, 2) = 6 6 6 1 1 1 PX ,Y (2,3) = PX ,Y (3,3) = PX ,Y (3, 4) = . 6 6 6 Find the covariance and correlation of X and Y . PX ,Y (1,1) =
2 yv := 4 5
0.1 0.16
i=1 j =1
(xvipi, j)
3
EX = 2.32
EY :=
i=1 j =1
(yv jpi, j)
EY = 3.69
EXY :=
i=1 j =1
(xviyv jpi, j)
CovXY = 0.401
VarX := EXSq EX
2
2
VarX = 1.098
VarY = 1.454
VarY := EYSq EY
= 0.317