Sei sulla pagina 1di 30

Managing Uncertainty I:

Probability and Discrete


Distributions

ctl.mit.edu

Zippy Bright
Zippy Bright manufactures electric toothbrushes that are sold
through large retail outlets. Zippy Bright is concerned with
how variable the sales are at different stores. They requested
and received a year of weekly sales data on their premiere
product, the XP219, for three stores from one of their
retailers, Sellco.
What can we say about the weekly sales in store A?
Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales Week Unit Sales
1
1
11
3
21
2
31
1
41
2
2
5
12
2
22
4
32
2
42
3
3
3
13
3
23
4
33
3
43
3
4
2
14
4
24
3
34
4
44
3
5
3
15
2
25
4
35
5
45
4
6
3
16
1
26
1
36
5
46
1
7
3
17
3
27
2
37
1
47
2
8
2
18
4
28
3
38
5
48
3
9
5
19
4
29
4
39
5
49
4
10
2
20
3
30
4
40
1
50
2
2

Zippy Bright Graphing it out!

Zippy Bright - Distributions


A histogram for the weekly sales.

Number of Weeks

Weekly Sales at Store A

A graphical representation of the distribution by mutually


exclusive and collectively exhaustive bins or intervals.
Shows relative probability of each interval.

20
15
10

PMF of Weekly Sales at Store A

35%

0
1

Sales per Week

30%

30%
25%
20%
15%

22%

22%

14%

12%

10%

The Probability Mass Function

Probability of each discrete random variable


Probabilities sum to 100% or 1.00

5%
0%
1

Sales per Week

Cumulative Distribution of Sales

Probability Table
Cumulative
Value Probability Probability
1
14%
14%
2
22%
36%
3
30%
66%
4
22%
88%
5
12%
100%

100%

100%

88%

80%

66%

60%
36%

40%
20%

14%

0%
1

3
Sales per Week

Basic Probability Laws 1 & 2

Basic Probability
Probability Theory

Mathematical framework for analyzing random events or experiments.


Experiments are events we cannot predict with certainty, e.g., weekly sales!

Notation

Events:

P(A) = probability that event A occurs,


P(A)= 0.12
P(B)= P(4) + P(5) = 0.34
P(C) = P(1) + P(3) + P(5) = .14 + .30 + .12 = 0.56

A = Sales = 5 units
B = Sales 4 units
C = Sales are an Odd number
D = Sales are 2 units

P(A) = complement of P(A) = probability some other event that is not A occurs
P(A)= 1 P(A) = 0.88
P(B)= 1 P(B) = 0.66
P(C) = 1 P(C) = 1 - 0.56 = 0.44

= Null or Empty set

= Union of sets (OR)


= Intersection of sets (AND)

P(B U C) = P[(Sales4) OR (Sales =1,3,5)]


= P[Sales = 1, 3, 4, or 5] = 0.78
P(B C) = P[(Sales4) AND (Sales =1,3,5)]
= P[Sales = 5] = 0.12
P(A D) = P[(Sales=5) AND (Sales2)] = 0
P(A U A) = P[(Sales=5) OR (Sales5)] = 1.00
U

Cumulative
Value Probability Probability
1
.14
.14
2
.22
.36
3
.30
.66
4
.22
.88
5
.12
1.00

Four Laws
of Probability

Cumulative
Value Probability Probability
1
.14
.14
2
.22
.36
3
.30
.66
4
.22
.88
5
.12
1.00

Events:

A = Sales = 5 units
B = Sales 4 units
C = Sales are an Odd number
D = Sales are 2 units

1. Probability of any event is between 0 and 1


0 P(A) 1

P(Sales>6) = 0
P(Sales are Prime Numbers) = P(1, 2, 3, 5)= 0.78
P(Sales <6) = 1
P(Sales < 1) = 0

2. If A and B are mutually exclusive events, then


P(A or B) = P(A U B) = P(A) + P(B)

P(A U D) = P[(Sales=5) OR (Sales =1 or 2)]


= P(Sales=5) + P(Sales =1 or 2) = .12 + .36 = 0.48
P(B U C) = P[(Sales4) OR (Sales =1,3,5)] = P[Sales = 1, 3, 4, or 5] = 0.78
P(Sales4) + P(Sales =1,3,5)] = .34 + .56 = .90 why?????
7

Basic Probability Laws 3 & 4

Four Laws
of Probability

Cumulative
Value Probability Probability
1
.14
.14
2
.22
.36
3
.30
.66
4
.22
.88
5
.12
1.00

Events:

A = Sales = 5 units
B = Sales 4 units
C = Sales are an Odd number
D = Sales are 2 units

Conditional Probability

P(A|B) = Probability that Event A occurs, GIVEN THAT Event B has occurred.
e.g., P(D|C) = P[(Sales2) Given That (Sales=1, 3, or 5)]

3. If A and B are any two events, then


P(A and B) P ( A B )
P(A | B) =
=
P(B)
P(B)
P(D|C) = P[(Sales2) Given That (Sales=1, 3, or 5)] = P(Sales=1) / P(Sales=1, 3, or 5) = .14 / .56 = 0.25
P(A|B) = P[(Sales=5) Given That (Sales4) = P(Sales=5) / P(Sales4) = .12 / .34 = 0.35
P(B|A) = P[(Sales4) Given That (Sales=5) = P(Sales=5) / P(Sales=5) = .12 / .12 = 1.00

P(A and B) = P(A B) = P ( A | B) P(B)


9

Four Laws
of Probability

Cumulative
Value Probability Probability
1
.14
.14
2
.22
.36
3
.30
.66
4
.22
.88
5
.12
1.00

Events:

A = Sales = 5 units
B = Sales 4 units
C = Sales are an Odd number
D = Sales are 2 units

Independence

A and B are independent if knowing that B occurred


does not influence the probability of A occurring

4. If A and B are independent events, then


P(A | B) = P(A)
P(A and B) = P(A B) = P ( A | B) P(B) = P ( A) P ( B )
Are Events C and A independent? Lets test it!
If P(C|A) = P(C) (that is the probability that sales are odd given that we sold 5 units),
then A and C are independent events.
P(C|A) = P(C and A)/P(A) = P[(Sale=1, 3, or 5) and (Sales=5)] / P(Sales=5)
= P(Sales=5) / P(Sales=5) = 1.00
Since this is not P(A) = 0.12, these are not independent events.
10

Characterizing Uncertainty

11

Characterizing a Distribution
Several ways to characterize a distribution:

Central Tendency what is the most likely value?


Spread how much do the observations differ?

Week
1
2
3
4
5
6
7
8
9
10

Unit
Sales
1
5
3
2
3
3
3
2
5
2

Week
11
12
13
14
15
16
17
18
19
20

Unit
Sales
3
2
3
4
2
1
3
4
4
3

Week
21
22
23
24
25
26
27
28
29
30

Unit
Sales
2
4
4
3
4
1
2
3
4
4

Week
31
32
33
34
35
36
37
38
39
40

Unit
Sales
1
2
3
4
5
5
1
5
5
1

Week
41
42
43
44
45
46
47
48
49
50

Unit
Sales
2
3
3
3
4
1
2
3
4
2
12

Central Tendency Metrics


Mode value that appears most frequently
Median value in the middle of a distribution

value separating the lower from the higher half

Mean sum of values divided by the total number of observations (average)


sum of values multiplied by their probability (expected value)

Mode = 3
Median = 3

13

Central Tendency - Mean


Sum sales and divide by number of observations (weeks)

Sum = 148 units sold


N = 50 weeks
Mean = Average = 148/50 = 2.96 units/week

35%
22%

25%
20%

Expected Value

30%

30%

22%

14%

15%

12%

10%

Notation

5%

X = Discrete random variable

0%
1

xi = Possible values of X, i.e., x1, x2, x3, . . .,xn

pi = Corresponding probabilities, i.e., p1, p2, p3, . . ., pn

Where P(X=xi) = pi and the probabilities sum to 1, i.e., p1+p2+p3++pn = 1


The expected value of X, E[X], is equal to:
x
p
px
i

E[X] = x = m = pi xi
n

i=1

1
2
3
4
5

.14
.22
.30
.22
.12

i i

.14
.44
.90
.88
.60

=2.96

14

Spread Metrics
Range maximum value minus minimum value
Inner Quartiles the 75th percentile value minus the 25th percentile value
Variance expectation of the squared deviation around the mean
2

Var[X] = s = pi ( xi - x ) = pi ( xi - m )
2

i=1

i=1

Max
75th Percentile

Range
= 5-1 = 4

25th Percentile

Inner
Quartile
= 4-2 = 2

Min

15

Spread - Variance
Variance Expectation of the squared deviation around the mean
Also called the Second Moment around the mean
2

Var[X] = s = pi ( xi - x ) = pi ( xi - m )
2

i=1

xi
1
2
3
4
5

pi
.14
.22
.30
.22
.12

pixi
.14
.44
.90
.88
.60
=2.96

i=1

xi-
-1.96
-0.96
0.04
1.04
2.04

(xi-)2
3.84
0.92
0.0016
1.08
4.16

pi(xi-)2
0.5376
0.2024
0.00048
0.2376
0.4992
2=1.48

Standard Deviation Square root of the variance


- In same units as the mean!

=1.48 = 1.215 units/week

Coefficient of Variation Ratio of standard deviation to the mean


- Standard measure of variability

CV=/= 1.215 / 2.96 = 0.411


16

Zippy Bright Summary Statistics

Minimum = 1
25th Pct = 2
=Mean = 2.96
50th Pct = Median = 3
Mode = 3
75th Pct = 4
Maximum = 5

Range = 4
Inner Quartile = 2
2 = Variance = 1.48
= Standard Deviation = 1.215
CV = Coefficient of Variation = 0.411

17

Discrete Probability Distributions

18

Probability Distributions
Where do they come from?

Empirical based on actual data


Theoretical based on a mathematical form

Which is better?

It depends on what you are trying to accomplish


Empirical distributions follow past history
Theoretical distributions can allow for more robust modeling
Typically, we look for the theoretical distribution that fits the
data

19

Discrete Theoretical Distributions


Discrete Uniform Distribution

N possible values
Each value has equal probability, i.e., pi= 1/N
Ex: Rolling a die

Poisson Distribution

Probability of seeing x events within a certain time period


Example: Random arrivals to a customer service desk
PMFs of Theoretical Distributions
40.0%
35.0%

Probabiity of X

30.0%
25.0%
20.0%

Uniform [1,6]

15.0%

Poisson (mean=1.5)

10.0%
5.0%
0.0%
0

Random Variable X

10
20

Discrete Uniform Distribution


Notation: U(a, b)

a = Minimum
b = Maximum
n = # of values = b a + 1

Metrics

Mean = (a + b) / 2
Median = (a + b) / 2
Mode N/A
Variance = ((b-a+1)2 1)/12

Probability Mass Function


1

for a x b
P X = x = f (x | a,b) = n
0
otherwise

PMF Rolling One Die


20%
16%
12%
8%
4%
0%
1

i
1
2
3
4
5
6

xi
1
2
3
4
5
6

pi
1/6
1/6
1/6
1/6
1/6
1/6

X = 1/6*1 + 1/6*2 + 1/6*3 + 1/6*4 + 1/6*5 + 1/6*6 = 3.5 = (6 + 1)/2


2X = 1/6*(1-3.5)2 + 1/6*(2-3.5)2 + 1/6*(3-3.5)2 + 1/6*(4-3.5)2 + 1/6*(5-3.5)2 + 1/6*(6-3.5)2 = 2.917
= ((6-1+1)2 -1) / 12
X = (2.917) = 1.708
21

Poisson Distribution

22

Poisson Distribution
Widely used to model arrivals, slow moving inventory, etc.
Discrete distribution that cannot take negative values
Notation: P()
x
p

= mean = variance

Probability Mass Function


-l x
e l
for x = 0,1,2,...
P X = x = f (x | l ) = x!

0
otherwise

Recall:

e = Eulers number 2.71828 . . .


= distribution parameter (mean)
x! = factorial of x,
e.g., 5! = 54321 = 120
and 0! = 1

0 61%
1 30%
2
8%
3
1%
4 0.2%
5 0.02%

0.70
0.60
0.50
0.40
0.30
0.20
0.10
-

2 3

Suppose =0.5
P[X=0] = (e-0.5 0)/(0!) = (0.607)(1)/1 = 0.61
P[X=1] = (e-0.5 1)/(1!) = (0.607)(0.5)/1 = 0.30
P[X=2] = (e-0.5 2)/(2!) = (0.607)(0.25)/2 = 0.08
P[X=3] = (e-0.5 3)/(3!) = (0.607)(0.125)/6 = 0.01
P[X=4] = (e-0.5 4)/(4!) = (0.607)(0.0625)/24 0.002
P[X=5] = (e-0.5 5)/(5!) = (0.607)(0.0312)/120 0.0002
23

Poisson Distribution for different values


0.50

0.45

Note:

As increases, the distribution becomes


more symmetric and bell shaped
Value is always an integer 0
The value of does not need to be integer

0.40

0.35

0.30

= 0.75

0.25

= 2
= 5

0.20

= 10
0.15

0.10

0.05

10

11

12

13

14

15

16

17

18

19

20
24

Probability Mass Function

Poisson Distribution
You are running the customer complaint
center for Zippy Bright. Customer
complaint calls come in ~P(2.2) per minute.

-l x
e l

P X = x = f (x | l ) =
x!

for x = 0,1,2,...
otherwise

Cumulative Density Function


-l k
x e l

P X x =
k=0
k!

1. What is the probability that no calls will come in over the next minute?
P[X=0] = (e-2.2 0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%

2. What is the probability that 2 or fewer calls will come in over the next minute?
P[X=0] = (e-2.2 0)/(0!) = (0.1108)(1)/1 = 0.11 or 11%
P[X=1] = (e-2.2 1)/(1!) = (0.223)(2.2)/1 = 0.24 or 24%
P[X=2] = (e-2.2 1)/(2!) = (0.223)(4.84)/2 = 0.27 or 27%

P[X2]62%

3. What is the probability that at least 1 call will come in over the next minute?
P[X>0] = 1 P[X=0] = 1 0.11 = 0.89 or 89%
Spreadsheet

Function

Prob 1

Prob 2

Microsoft Excel

=POISSON.DIST(x, mean, cumulative)

=POISSON.DIST(0, 2.2, 0)

=POISSON.DIST(2, 2.2, 1)

Google Sheets

=POISSON(x, mean, cumulative)

=POISSON(0, 2.2, 0)

=POISSON(2, 2.2, 1)

LibreOffice->Calc

=POISSON(Number; Mean; C)

=POISSON(0; 2.2; 0)

=POISSON(2; 2.2; 1)

25

Key Points from Lesson

26

Key Points from Lesson (1/3)


Probability Laws

Probability of any event is between 0 and 1


If A and B are mutually exclusive events, then
P(A or B) = P(A U B) = P(A) + P(B)
If A and B are any two events, then

P(A and B) P ( A B )
P(A | B) =
=
P(B)
P(B)

0 P(A) 1

P(A and B) = P(A B) = P ( A | B) P(B)

If A and B are independent events, then


P(A | B) = P(A)

P(A and B) = P(A B) = P ( A | B) P(B) = P ( A) P ( B )

27

Key Points from Lesson (2/3)


Characterize a distribution:

Central Tendency
Mode value that appears most frequently
Median value in the middle of a distribution, separating the lower
from the higher half
Mean () sum of values multiplied by their probability (expected value

Spread
Range maximum value minus minimum value
Inner Quartiles 75th percentile value minus the 25th percentile value
Variance (2) - expectation of the squared deviation around the mean
Standard Deviation () - Square root of the variance
Coefficient of Variation (CV) Standard deviation over the mean = /

E[X] = x = m = pi xi
n

i=1

Var[X] = s = pi ( xi - x ) = pi ( xi - m )
2

i=1

i=1

28

Key Points from Lesson (3/3)


Theoretical Distributions

Discrete Uniform

PMF Rolling One Die

Probability Mass Function


1

for a x b
P X = x = f (x | a,b) = n
0
otherwise

20%
16%
12%
8%
4%
0%
1

Poisson
Probability Mass Function

-l x
e l
P X = x = f (x | l ) = x!

0.70
0.60
0.50

for x = 0,1,2,...

0.40
0.30
0.20

otherwise

0.10
-

2 3

5
29

Questions, Comments, Suggestions?


Use the Discussion Forum!

Dexter, Brody, and Wilson hoping that the probability of


getting the treat is not zero.
caplice@mit.edu

ctl.mit.edu

Potrebbero piacerti anche