Sei sulla pagina 1di 35

Overview of Probability and Statistics

Gregory Rahn & Regina Rahn

Copyright 2001 Genemetrix

2 Overview of Probability and Statistics


Probability Theory - known distribution or population
Population parameters are known with certainty - mean () - variance (2) - shape parameters (skewness & kurtosis) Use the distribution to acquire probabilities of the occurrence of certain events Defined explicitly for the distribution

Statistics - start with data (observed values from an unknown "empirical" distribution)
Functions of the data that estimate parameters {mean, variance, skewness, and kurtosis} Estimate probabilities

Copyright 2001 Genemetrix

Statistics - Estimation of Parameters


Measures of Location Average ( X )
X = i =1

Xi n

most common measure of central tendency

Median (Md) Md = the value that divides ranked observations in half = X(n+1)/2 if n is odd X n / 2 + X ( n / 2 )+1 = if n is even 2 Mode (Mo) Mo = the most frequent data point

Ex. Data {3, 2, 9, 1, 6, 8, 2}

Ranked Data {1, 2, 2, 3, 6, 8, 9}

X = (3+2+9+1+6+8+2)/7 = 31/7 = 4.43

Md = X(7+1)/2 = X4 = 3 Mo = 2 most frequent observation (occurred twice)

Copyright 2001 Genemetrix

4 Properties of the Average (Xi- X )2 is less than the squared deviations from any other estimate Ex. (Xi- X )2 (Xi-Md)2 - average is the minimum variance estimate Gets pulled in the direction of extreme points

Example

Data {1, 2, 3, 4, 9}
_ X

Including X5 = 9 X = 3.8 Md = 3 X5 Md

_ X Excluding X5 = 9 X = 2.5 Md = 2.5 Md

Average can be very sensitive towards extreme points, while the median is fairly robust Sensitivity depends upon the sample size and the deviation of the extreme point! Assumption of X : Xi's are independently and identically distributed (i.i.d.) This is often not a good assumption!

Copyright 2001 Genemetrix

5 Measures of Dispersion Range (R) R = Xn - X1

= largest value - smallest value

Must sort data from low (X1) to high (Xn)

Ex. Data {3, 2, 9, 1, 6, 8, 2}


R =91=8

Ranked Data {1, 2, 2, 3, 6, 8, 9}

Properties of the Range Bad: It only uses two pieces of information. Good: It is easy to compute manually. Uses of the Range Range itself is useful for characterizing a distribution (order statistics) Range can be used to estimate the standard deviation ( Many practical applications once the standard deviation is approximated: - Control Charts - Process Capability - Gage Repeatability & Reproducibility Problems when using the Range to Approximate the Standard Deviation The d2 coefficient depicts the relationship between the range and standard deviation for a normal distribution. Thus, the Range method for estimating standard deviation is only valid if the parent distribution is normally distributed.

= R/d2)

Copyright 2001 Genemetrix

6 Sample Variance (S2)


2 (X i X) n
sum of squares deg rees of freedom

S2 =

n 1 Most common and reliable measure of dispersion

i =1

Ex. Data {3, 2, 9, 1, 6, 8, 2}


S2 =

Ranked Data {1, 2, 2, 3, 6, 8, 9}


= 61.71/6 = 10.286

S= S

(3 4.43) 2 + ( 2 4.43) 2 + ... + (2 4.43) 2 7 1 2 = 3.207

Xi 3 2 9 1 6 8 2 Average = 4.43

(Xi- X ) 1.43 2.43 4.57 3.43 1.57 3.57 2.43

(Xi- X )2 2.04 5.90 20.88 11.76 2.46 12.74 5.90 Sum = 61.71

(Xi- X )

Importance of S:
4.43

Copyright 2001 Genemetrix

7 Same units as measurements Positive numbers that increase when variability increases Sample variance is the unbiased and minimum variance estimate for the population variance (irrespective of the distribution type)

The sample variance is really an average of the squared deviations.


2 (X i X) n
sum of squares deg rees of freedom

S2 =

i =1

n 1

Why (n-1) degrees of freedom?

Only (n-1) independent deviations!

Ex. Data {1, 2, 3, 4, 5}


(Xi- X ) = 0 Xi - n X = 0 Xi - Xi = 0 X = xi/n Example ( X = 3) Xi Dev 1 -2 2 3 4 5 -1 0 1 2 Sum Dev = 0

Copyright 2001 Genemetrix

8 Grand Average and Pooled Variance Estimates Subgroup averages and variances are merged into historical estimates of average and variance used for control chart centerlines Grand average ( x ) = average of subgroup averages Pooled variance ( S 2 p ) = average of subgroup variances
Xi m
m

x=

i= 1

if n is constant

x=

i =1 m

ni Xi

if ni is variable

Always Correct

i =1

ni
m

S2 p

S
i =1

2 i

if n is constant

S2 p

i =1 m

iSi2

i =1

if ni is variable
i

Always Correct

Copyright 2001 Genemetrix

Copyright 2001 Genemetrix

10

Probability Theory
Distribution Functions

Discrete Distributions Discrete Probability Density Function (pdf): f(x) = Pr[X=x] Properties of the discrete pdf 1) f(x) 0 each probability is greater than or equal to 0. f(x) = 1 2) the sum of the probabilities equals 1.0 x f(t) Discrete Cumulative Distribution Function (cdf): F(x) = P(X x) = t x Ex. Binomial (n=5 trials, p=.2) f(x) =

( x ) px (1-p)n-x

(x )
5

n! = x!(n x)!

f(0) = ( 0 ) .20 (1-.2)5 = 0.32768 5 f(1) = ( 1 ) .21 (1-.2)4 = 0.4096 5 f(2) = ( 2 ) .22 (1-.2)3 = 0.2048 5 f(3) = ( 3 ) .23 (1-.2)2 = 0.0512 5 f(4) = ( 4 ) .24 (1-.2)1 = 0.0064 5 f(5) = ( 5 ) .25 (1-.2)0 = 0.00032 f(0) + f(1) + f(2) + f(3) + f(4) + f(5) = 1.0 F(2) = P(X2) = f(0) + f(1) + f(2) = 0.94208

Probability of 0 successes in 5 trials Probability of 1 success in 5 trials Probability of 2 successes in 5 trials Probability of 3 successes in 5 trials Probability of 4 successes in 5 trials Probability of 5 successes in 5 trials Property of a pdf

Copyright 2001 Genemetrix

11

Joint Probability of Multiple Events


P = probability of success F = probability of failure = 1-P Pr[success on 1st trial] = P Pr[success on 1st trial and success on 2nd trial] = P*P Pr[success on 1st trial and success on 2nd trial and failure on 3rd trial]=P*P*F = P2(1-P) Therefore: Pr[x successes in n trials] = Px(1-P)n-x

Combinations:
Number of combinations = (
n x

n! x!(n x)!
5

Number of combinations of 1 success in 5 trials = ( 1 ) = 1) S F F F F 2) F S F F F 3) F F S F F 4) F F F S F 5) F F F F S

5! 1 !(5 1)!

=5

Number of combinations of 2 success in 5 trials = ( 2 ) = 1) S S F F F 2) S F S F F 3) S F F S F 4) S F F F S 5) F S S F F 6) F S F S F 7) F S F F S 8) F F S S F 9) F F S F S 10) F F F S S

5! 2!(5 2)!

5x 4 2!

= 10

Copyright 2001 Genemetrix

12 Continuous Distributions Continuous Probability Density Function (pdf): f(x) does not equal a probability Properties of the continuous pdf 1) f(x) 0 the function is positive over all the region of X 2) f(x) dx = 1
+

the total area under the curve equals 1.0 (probability) = P(X x) = f(t) dt
x

Cumulative Distribution Function (cdf): F(x)

f(x)

x
F(x) = area under f(x) to the left of x

Ex. f(x) = 2x (0 x 1)
1 0

| = 12 - 02 = 1 2x dx = x2 0 F(x) = 2t dt = t2 | = x2 0
0 x x

area under the curve equals 1, thus proving it is a pdf F(.5) = 0.52 = 0.25 = Pr[X .5]

Probabilities and Percentage Points (Variates) from Common Distributions Copyright 2001 Genemetrix

13 Tables or functions exist for common distributions such as Z, t, F, and chi-squared to: determine the lower tail probability for a given value of x determine the value of x based on the lower tail probability Area between two limits Pr[a < X < b] = f(x) dx = F(b)-F(a)
a b

= Pr[Conformance] if a=LSL and b=USL

Copyright 2001 Genemetrix

14 Expectations Discrete Distributions Let the possible values (sample space) for X be denoted by x1,x2, ... ,xn and f(xi) = Pr[X=xi] E[X] = xi f(xi)
i=1 n n

E[X2] = x 2 i f(xi)
i=1 n

E[u(X)] = u(xi) f(xi)


i=1

Ex. Binomial (n=5,p=.2) E[X] = 0 (0.32768) + 1 (0.4096) + 2 (0.2048) + 3 (0.0512) + 4 (0.0064) + 5 (0.00032) E[X2] = 02 (0.32768) + 12 (0.4096) + 22 (0.2048) + 32 ((0.0512) + 42 (0.0064) + 52 (0.00032)

= 1.0 = np *binomial property

= 1.8

Consideration: What if f(xi) was a constant i? (ex. 1/n) E[X] = xi (1/n) =


i=1 k

i=1

Xi n

= sample average

The sample average puts an equal weighting on all observations.

Copyright 2001 Genemetrix

15 Continuous Distributions
+

E[X] =

x f(x) dx u(x) f(x) dx

E[u(X)] =

Ex. f(x) = 2x (0 x 1) E[X] = x 2x dx = 2x2 dx = 2/3 x3 | = 2/3 0 E[X ] = x 2x dx = 2x3 dx = 2/4 x4 | = 2/4 0
2 2
0 0 0 1 0 1 1 1 1 1

Copyright 2001 Genemetrix

16 Variance is an Expectation VAR[X] = E[(X-E[X]) 2] VAR[X] = E[X2] - {E[X]} 2 where E[X] = when factored out

VAR[X] = "Expected value of the product minus the product of the expected values"

Ex. Binomial (n=5,p=.2) VAR[X] = (1.8) - {1} 2 = 0.8 = npq = np(1-p) *binomial property

Ex. f(x) = 2x (0 x 1) VAR[X] = (2/4) - {2/3} 2 = 0.055555

Copyright 2001 Genemetrix

17 Example: Discrete Expected Value Daily sales records for a computer manufacturing firm show that it will sell 0, 1, or 2 mainframe computer systems with probabilities as listed. Number of sales (x) Probability f(x) A) 0 0.7 1 0.2 2 0.1

Find the expected value and standard deviation of daily sales. Expected value of daily sales: E(x) =

i =1

x f(x) = (0)(0.7) + (1)(0.2) + (2)(0.1) = 0.4 mainframe computers

Standard deviation (x) of daily sales: x2 = E [x2] E[x]2 E [x ] =


2

i =1

x2 f(x) = (02)(0.7) + (12)(0.2) + (22)(0.1) = 0.6 x = (0.44)1/2 = 0.6633

x2 = 0.6 (0.4)2 = 0.44

B)

The firms daily fixed cost is $30,000 and their marginal cost is $200,000 (cost per unit). If a mainframe system sells for $500,000, what is the expected daily profit? Daily profit = Revenues costs

Fixed daily cost = $30,000; Cost per unit = $200,000; Revenue per unit = $500,000 Daily profit = (revenue per unit)(expected value sold) fixed daily cost (cost per unit)(expected value ) = (500000)(0.4) - (30000) (200000)(0.4) = $90,000 per day

Copyright 2001 Genemetrix

18 Example: Continuous Expected Value The outside diameter of washers is a continuous random variable, x, distributed uniformly from 300 320 mm. Calculate: A) f(x) Let x = outside diameter This is a uniform distribution f(x) = c, a constant for a pdf f(x) dx = 1

320 300

c dx = 1 c(320 300) = 1 solve for c c = 1/20

Therefore: f(x) = 1/20 for 300 < x < 320 f(x) = 0 elsewhere

B)

E[x]
320

E[x] =

300

x f(x) dx = x/20 dx = 1/40 x2


300

320

320

= 1/40 (102400 90000)

300

= 310 mm

C)

VAR[x] VAR[x] = E[x2] E[x]2


320

E[x2] =

300

x2 f(x) dx = x2/20 f(x) dx = 1/60 x3


300

320

320

= 96133.33

300

Therefore: Var[x] = 96133.33 (310)2 = 33.33

Copyright 2001 Genemetrix

19 Median and Mode Median value of the 50th percentile F(x) 1.0

0.5

0 Md

Mode value with the largest f(x) Value of X where the derivative of f(x) equals 0 f(x)

X Mo

Copyright 2001 Genemetrix

20 Specific Distributions Discrete Binomial X=Nn= number of successes in n trials f(x) =

( x ) px (1-p)n-x
x t =0

x={0,1, ...,n}

F(x) =

t n-t (n t ) p (1-p)

E[X] = np =

VAR[X] = npq = 2

Example
The probability that a piece of luggage will survive the stress test is 0.65. If six bags are randomly tested: A) What is the probability that exactly four will survive? Given: P(luggage survives) = 0.65 P(luggage fails) = 0.35

Exactly 4 bags survive, let x = number that survive This is binomial. p = 0.65, q = 0.35, n = 6 P(x = 4) = P(4) = 0.3280 B)

( ) p4 q2 = ( ) (0.65)4 (0.35)2 = (15)(0.1758)(0.1225) =

6 4

6 4

Given that the 1st and 2nd bags survived, what is the probability that the 3rd and 4th bags will fail? Note here that the trials are independent, and that trials 1 and 2 already occurred, so the probability of their occurrence = 1. Let x = number that survive p = 0.65, q = 0.35, n = 2 P(x = 0) = P(0) =

( ) p0 q2 = ( ) (0.65)0 (0.35)2 =
Copyright 2001 Genemetrix

2 0

2 0

0.1225

21 Poisson X=N(t)= number of arrivals occurring in a given time interval e t (t)x f(x) = x! e t (t)i F(x) = i=0 i!
x

x={0,1, ...,

E[X] = t =

VAR[X] = t = 2

Example
The manufacturing defect rate of a product is 0.005 defects per unit. probability of zero defects occurring in 100 units? t = 0.005 DPU * 100 units = 0.5 f(0) =
e 0.5 (0.5) 0 0!

What is the

= 0.60653

Copyright 2001 Genemetrix

22 Continuous Normal f(x) =


x

1 (X ) 2 exp -1/2 2

- x

F(x) =

f(t) dt which is estimated numerically.

Since an infinite number of mean-variance combinations exist, a standardized variable was developed. Standard Normal Transformation Z= X

Transforms all the observations of any normal random variable X to a new set of observations of a standard normal variable Z. E[X] = E[Z] = 0 Proof: E[Z] = E[X] = =0 VAR[X] = 2 VAR[Z] = 1 X X ] = VAR[ ] 2 1 VAR[Z] = 2 VAR[X] = 2 = 1 VAR[Z] = VAR[

Corollary: VAR[cX] = c2 VAR[X] X - N(,2) Z - N(0,1) Importance: a single table of Z probabilities can be used for all combinations of (,2). FYI S2 is an unbiased estimate of 2, but S 2 is a biased estimate of . Some authors espouse using a C4 index to compensate for the bias induced by taking the square root. The problem with the C4 index is that the VAR[Z] no longer equals 1 as described above. Copyright 2001 Genemetrix

23

Two Types of Normal Distribution Problems


1) 3 Knowns Transform to Z Find corresponding probability Example Given a normal distribution with = 50 and = 10, find the probability that X falls within its specification limits of 45 and 62. Pr[45 X 62] = Pr[ Pr[Z 1.2] = 0.8849 Pr[Z < -0.5] = 0.3085 Pr[-0.5 Z 1.2] = 0.8849 0.3085
45 50 10

62 50 10

= Pr[-0.5 Z 1.2]

45 50 -0.5 0

62 1.2

X Space Z Space

2) Known probability Find corresponding Z value Solve for 1 unkown given 2 knowns Example On an examination, the average grade was 74 and the standard deviation was 7. If 12% of the class are given As, and the grades are follow a normal distribution, what is the lowest possible A?

Pr[Z < z] = 0.88 1.175 =


X 74 7

z = 1.175 X = 82.225

74

82.225

X Space Z Space

1.175 Two Types of Sampling Normal Distribution0Problems

1) 4 Knowns Transform to Z Find corresponding probability Copyright 2001 Genemetrix

24

Example Given a normal distribution with = 50, = 10, and a sample size of n = 40, find the probability that X falls within its control limits of 47 and 54. Pr[45
X

62] = Pr 10

47 50 40

54 50 10 40

= Pr[-1.90 Z 2.53]

Pr[Z 2.53] = 0.9943 Pr[Z < -1.90] = 0.0287 Pr[-1.90 Z 2.53] = 0.9943 0.0287 = 0.9656 2) Known probability Find corresponding Z value Solve for 1 unkown given 3 knowns Example A drilling operation produces holes with diameters that are approximately normally distributed. If the process mean and variance are 2.1 and 0.0225, respectively, what should be the sample size to ensure that no more than 14% of the sample means will be greater than 2.15? This is normally distributed, where = 2.1 and 2 = 0.0225. We want to find n. Given: P(x > 2.15) < 0.14 or P(x < 2.15) > 0.86 Transform to Z P(Z < Z*) > 0.86 Look up Z-value in the table Z* > 1.08 Now: Z* > Solve for n:
X n

2.15 2.1 0.0225 n

(0.05) ! n (0.15)

n > (1.08)(0.15)/(0.05) = 3.24 n > 10.49

Therefore: n > 11 (Need a whole number sample.)

Copyright 2001 Genemetrix

25

Assumptions of the standard normal distribution


1) X is normally distributed 2) is known with certainty 3) 2 is known with certainty 4) observations (xi) are independently and identically distributed (i.i.d.) When the population variance is known the Z distribution is used. Z=
X n

When the population variance is unknown, there is uncertainty in the estimate of 2. Therefore, a wider distribution was developed to account for this uncertainty. t Distribution: t= X X S2 p /n S2 p = pooled variance

Probabilities and percentage points can be obtained from a t table. E[t]=0, VAR[t]=1 Example The outside diameter of washers follows a normal distribution with a mean of 1.20 inches. A sample of 9 washers will result in a sample standard deviation of 0.03. Calculate the probability that a sample mean will lie between 1.18140 and 1.22306. This is normally distributed, where = 1.20, s = 0.0225, and n = 9. P(1.18140 < X < 1.22306) = P(t1 < t < t2) Use the transformation: t1 = t2 =
1.1#140 1.20 = 0.03 " 1.22306 1.20 = 0.03 "

-1.86 2.306

Therefore: P(-1.86 < t < 2.306) = P(t < 2.306) P(t < -1.86) = 0.975 0.05 = 0.925

(Look up values.)

Copyright 2001 Genemetrix

26

2.1 Types of Inferences


- Gather some knowledge concerning the population using data

Considerations: 1) Are the samples representative of the population? (Sampling) 2) How do we make inferences about the population parameters? 3) How reliable are these inferences?

Sampling - In order to obtain valid inferences of the population, we must obtain samples that are representative of the population.

Random Sample - observations are made independently (x1, x2, ..., xn) and randomly - each value (xi) came from distributions having the same pdf {f(x)} - i.i.d.: independently and identically distributed

Importance - Joint probability equals the product of the marginal probabilities. - COV[X1,X2]=0, Variance of the sum equals the sum of the variances - Rational sample

Copyright 2001 Genemetrix

27 Hypothesis Testing Make a hypothesis (assumption) about the population parameter of interest

ex.

H0: Null hypothesis H0: =4

HA: Alternative hypothesis (compliment of H0) HA: 4

/2 0

/2

Two Conclusions: 1) Reject H0 2) Cannot reject H0 - Can never "accept" because we don't know what the true parameter really is, however we can conclude that it is not some value.

Copyright 2001 Genemetrix

28 Hypothesis Testing of the Mean Test Statistic: Z =


X n

Critical Values (define rejection regions)

Zcrit = Z /2 and Z 1-/2

Compute test statistic (Zcalc)- where does the observed value fall with respect to the assumed reference distribution? Rejection Criterion: Given a mean of 0, (1-) of the values will fall between Zcrit and -Zcrit. If the calculated statistic (Zcalc) falls in the rejection regions, then with a probability of (1-) this sample did not come from a population with mean 0. 0 Possible Situations: Cannot Reject H0 Reject H0 H0 is True Correct Decision Type I Error H0 is False Type II Error Correct Decision

Type I Error - "Wrongful rejection" - rejection of null hypothesis when it is true Pr[Type I Error] = Type II Error - "Wrongful acceptance" - "acceptance" of the null hypothesis when it is false Pr[Type II Error] = Pr[Rejection] = when null hypothesis is true Pr[Rejection] = 1- = "power" when null hypothesis is false

Copyright 2001 Genemetrix

29 Hypothesis Testing of the Variance (n 1)S 2 Test Statistic: = 2


2

Copyright 2001 Genemetrix

30

Example: Hypothesis Test Using a Z Distribution For a random sample of 50 measurements on the breaking strength of cotton threads, the mean breaking strength was found to be 210 grams and the standard deviation 18 grams. A) The manufacturer claims that the population mean is 215 grams. State the hypothesis and solve for = 0.10. The claim is that = 215 g. This is a two-tailed test. Null hypothesis: HO: = 215 Zcritical = Z0.10/2 = Z0.05 = 1.645 Compare Zcalc to Zcritical :
1."6

Alternative hypothesis: HA: 215 Zcalc =


X

210 215 1# 50

= -1.96

is not less than 1.645

Therefore: Reject HO that = 215 for = 0.10. Manufacturers claim is invalid.

C)

Is there evidence that the population mean of breaking strength exceeds 218 grams? State the hypothesis and solve for = 0.05. Using an = 0.05, check if the population mean is > 218g. This is a one-tail test. Null hypothesis: HO: < 218 Zcritical = Z = Z0.05 = 1.645 Alternative hypothesis: HA: > 218 Zcalc =
X

210 21# 1# 50

= -3.771

Compare Zcalc to Zcritical : -3.771 is not greater than 1.645 Therefore: The null hypothesis, HO: < 218, cannot be rejected at = 0.05.

Copyright 2001 Genemetrix

31 Example: Hypothesis Test Using a t Distribution An auto company states that its new compact car has an average fuel economy (miles per gallon) greater than or equal to 55 mpg on the highway. Eight cars were randomly selected and driven. The results of the study were: 57, 52, 50, 49, 53, 51, 47, and 55. State the hypothesis and solve for = 0.05. The claim is that the average mpg > 55 mpg. This is a one-tailed test with = 0.05, and (n-1) = = (8-1) = 7 (degrees of freedom). Null hypothesis: HO: > 55 Alternative hypothesis: HA: < 55

Since the true variance is unknown a t distribution will be used.

X = (57 + 52 + 50 + 49 + 53 + 51 + 47 + 55 ) / 8 = 51.75 mpg Sp =


2

i =1

( xi 51.75) 2 n 1

73.5 7

= 10.5 s = 3.24037 = 1.14564

2 Sp n =sX =

3.24037 s = # n

tcritical = t, 1- = t7, 0.95 = -1.895 (from tables) tcalc =


X S
2 p

51.75 55 1.14564

= -2.8368

Compare tcalc to tcritical : -2.8368 < -1.895

Therefore: Reject the null hypothesis, HO: > 55 at a = 0.05 Manufacturers claim is invalid.

Copyright 2001 Genemetrix

32 Example: Hypothesis Test Using a 2 Distribution The same auto company as in the previous example claims that the true variance of fuel economy (mpg) is less than or equal to 5. Using the same data, state the hypothesis and solve for = 0.01. The company claims that 2 of car fuel economy < 5. This is a one-tailed test with = 0.01. Null hypothesis: HO: 2 < 5 Alternative hypothesis: HA: 2 > 5

From exercise 2-7, Sp2 = 10.5 and = 7 2critical = 2, = 27, .01 = 18.745 (from tables) 2calc =
(n 1) Sp 2

(7)(10.5) 5

= 14.7

Compare 2calc to 2critical : 14.7 is not greater than 18.745

Therefore: Reject the null hypothesis, HO: 2 < 5, at = 0.01. Manufacturers claim is invalid.

Copyright 2001 Genemetrix

33

Decision Making Using Conditional Probabilities


Example - Number of people in a small town

Male (M) Employed (E) 50

Female ( M ) 25

Total 75

Unemployed ( E )

10

15

25

Total

60

40

100

Marginal Probabilities

Consider only one distribution Pr[Employed] = 75 / 100 = 0.75 Pr[Unemployed] = 25 / 100 = 0.25 Pr[Male] = 60 / 100 = 0.6 Pr[Female] = 40 / 100 = 0.4

Copyright 2001 Genemetrix

34 Joint Probabilities

Consider more than one distribution Pr[A, B] = Probability that event A occurred and event B occurred Pr[Employed, Male] = 50 / 100 = 0.50 Pr[Employed, Female] = 25 / 100 = 0.25 Pr[Unemployed, Male] = 10 / 100 = 0.10 Pr[Unemployed, Female] = 15 / 100 = 0.15

Conditional Probabilities

Pr[A | B] = Probability that event A will occur given that event B has already occurred Pr[Employed | Male] = 50 / 60 = 0.833 Pr[Unemployed | Male] = 10 / 60 = 0.167 Pr[Employed, Female] = 25 / 40 = 0.625 Pr[Unemployed, Female] = 15 / 40 = 0.375

Copyright 2001 Genemetrix

35 Bayes Theorem

.833

E E E E

M
.6 .4 .167 .625

Pr[M] * Pr[E | M] = Pr[M,E] 0.6 * 0.833 = 0.5

M
.375

Equivalent Representation

M
.75

E M M E M

Pr[E] * Pr[M | E] = Pr[M,E] 0.75 * Pr[M | E] = 0.5

.25

Bayes Theorem Relationship


Pr[E] * Pr[M | E] = Pr[M] * Pr[E | M] Pr[M | E] =
&r' ( $ &r' % | ( $ &r' % $

.6 ! .#33 .75

= 0.666

Copyright 2001 Genemetrix

Potrebbero piacerti anche