Sei sulla pagina 1di 57

Week 5: Probability Distributions

Unit 1: Properties of Distributions


Properties of Distributions
Introduction

▪ A probability distribution is a mathematical function that


provides the probabilities of occurrence of different
possible outcomes in an experiment.

http://statisticsbyjim.com/basics/probability-distributions/
https://en.wikipedia.org/wiki/Probability_distribution
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2
Properties of Distributions
Types of probability distribution

1. Discrete probability distribution

▪ The set of possible outcomes is discrete

2. Continuous probability distribution

▪ The set of possible outcomes can take on values in a


continuous range

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Properties of Distributions
Discrete probability functions

p(S)
1/6
0.16

0.14 5/36
0.12
1/9
0.10
0.08 1/12

0.06
1/18
0.04 The probability mass
1/36 function (pmf) of counts
0.02
from two dice

2 3 4 5 6 7 8 9 10 11 12
S
https://en.wikipedia.org/wiki/Probability_distribution
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
Properties of Distributions
Discrete probability example

Number of
Probability
Heads

0 0.25

1 0.50

2 0.25

Flip a coin two times

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Properties of Distributions
Continuous probability functions

0.4
0.3
0.2

34.1% 34.1%
0.1

2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0

-3σ -2σ -2σ 0 1σ 2σ 3σ

The probability density function (pdf) of the normal distribution

https://en.wikipedia.org/wiki/Probability_distribution
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
Properties of Distributions
Continuous probability example 1

a
Refer to https://stattrek.com/probability-distributions/discrete-continuous.aspx for more information.
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Properties of Distributions
Continuous probability example 2

http://onlinestatbook.com/2/calculators/normal_dist.html to draw a normal distribution


© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8
Properties of Distributions
Continuous probability vs discrete probability distribution

Number of
Probability
Heads

0 0.25

1 0.50

2 0.25

Discrete Continuous

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Properties of Distributions
Summary

▪ A probability distribution is a mathematical function that


provides the probabilities of occurrence of different
possible outcomes in an experiment.
▪ A discrete random variable can take only a finite
number of different values like 0,1,2,3,4, etc., whereas
a continuous random variable is a variable that can
take an infinite number of possible values.
▪ Discrete probability functions are also known as
“probability mass functions” and can assume a
discrete number of values.
▪ Continuous probability functions are also known as
“probability density functions” and the probabilities
are measured over ranges of values rather than single
points.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 5: Probability Distributions
Unit 2: The Normal Distribution
The Normal Distribution
Introduction

50

40

30

20

10

0
100 120 140 160 180 200
The Normal Distribution

https://www.mathsisfun.com/data/standard-normal-distribution.html
https://en.wikipedia.org/wiki/Normal_distribution
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2
The Normal Distribution
Characteristics

The characteristics of the normal distribution: Mean


▪ mean = median = mode Median
▪ symmetry about the centre Mode
▪ 50% of values less than the mean and 50% greater
than the mean

Symmetry

50% 50%
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
The Normal Distribution
Standard deviation

68% of values are


within 1 standard
deviation of the mean

95% of values are


within 2 standard
deviations of the mean

99.7% of values are


within 3 standard
For a standard deviation calculator, see:
https://www.mathsisfun.com/data/standard-
deviations of the mean
deviation-calculator.html
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
The Normal Distribution
Standard normal distribution

Standardize

950 970 999 1010 1030 1050 1070 -3 -2 -1 0 +1 +2 +3

A Normal Distribution The Standard Normal Distribution

z is the "z-score" (standard score)


x is the value to be standardized
The formula for the z-score:
μ is the mean
σ is the standard deviation

For an interactive standard normal distribution calculator, see:


https://www.mathsisfun.com/data/standard-normal-distribution-table.html
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
The Normal Distribution
Standard normal distribution example

19.1% 19.1%
15.0% 15.0%
0.5% 9.2% 9.2% 0.5%
4.4% 4.4%
0.1% 0.1%
1.7% 1.7%

Standard Normal Distribution


© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
The Normal Distribution
The empirical rule

99.7% of data is within 3 standard


deviations of the mean (തx − 3s to xത + 3s)
95% within 2 standard
deviations
68% within 1
standard deviation
0.4
0.3
0.2

34.1% 34.1%
0.1

2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0

𝑥−𝜇
-3σ -2σ -1σ 0 1σ 2σ 3σ 𝑧=
𝜎

𝒙 − 𝟑𝒔 ഥ
𝒙 − 𝟐𝒔 ഥ
𝒙−𝒔 ഥ
𝒙 ഥ
𝒙+𝒔 ഥ
𝒙 + 𝟐𝒔 ഥ
𝒙 + 𝟑𝒔
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
The Normal Distribution
Rules of thumb for detecting outliers

Possible Outliers Outliers


|z| > 2 |z| > 3

Not unusual

Moderately Moderately
unusual unusual
Outliers Outliers

Z = -3 Z = -2 Z = -1 Z=0 Z=1 Z=2 Z=3

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


The Normal Distribution
Central limit theorem

population sampling distribution


distribution of the mean

https://en.wikipedia.org/wiki/Central_limit_theorem
https://machinelearningmastery.com/a-gentle-introduction-to-the-central-limit-theorem-for-machine-learning/
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9
The Normal Distribution
Summary

▪ The normal distribution is a very commonly encountered continuous probability


distribution.
▪ The characteristics of the normal distribution are:
− mean = median = mode
− symmetry about the centre
− 50% of values less than the mean and 50% greater than the mean
▪ When we calculate the standard deviation, we find that generally:
− 68% of values are within 1 standard deviation of the mean
− 95% of values are within 2 standard deviations of the mean
− 99.7% of values are within 3 standard deviations of the mean
▪ The empirical rule states that for a normal distribution, nearly all of the data will
fall within three standard deviations of the mean.
▪ The central limit theorem (CLT) establishes that when independent random
variables are added, their properly normalized sum tends towards a normal
distribution even if the original variables themselves are not normally distributed.
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10
Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 5: Probability Distributions
Unit 3: Kurtosis and Skewness
Kurtosis and Skewness
Introduction to kurtosis

Positive Kurtosis

Negative Kurtosis
Normal Distribution

https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/statistics-definitions/kurtosis-leptokurtic-platykurtic/
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2
Kurtosis and Skewness
Kurtosis

▪ Data sets with high, positive kurtosis tend to ▪ Data sets with low kurtosis tend to have
have heavy tails, or outliers. light tails, or lack of outliers.
35
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11

▪ This distribution has positive kurtosis ▪ This distribution has low kurtosis (no tails)
(heavier tails compared to the normal
distribution)

https://en.wikipedia.org/wiki/Kurtosis
https://www.spcforexcel.com/knowledge/basic-statistics/are-skewness-and-kurtosis-useful-statistics
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Kurtosis and Skewness
Excess kurtosis

0.8
D, 3
S, 2 Key:
L, 1.2
0.7 N, 0 Red, kurt 3, Laplace (D)ouble exponential
C, -0.59376
W, -1 distribution;
0.6 U, -1.2
Orange, kurt 2, hyperbolic (S)ecant distribution;

0.5 Green, kurt 1.2, (L)ogistic distribution;


Black, kurt 0, (N)ormal distribution;
0.4 Cyan, kurt −0.593762…, raised (C)osine
distribution;
0.3 Blue, kurt −1, (W)igner semicircle distribution;
Magenta, kurt −1.2, (U)niform distribution.
0.2

0.1

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/statistics-definitions/kurtosis-leptokurtic-platykurtic/
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4
Kurtosis and Skewness
Kurtosis in financial markets

▪ Real estate (with a kurtosis of 8.75) and high yield


US bonds (8.63) are high risk investments.
▪ Investment grade US bonds (1.06) and small cap
US stocks (1.08) would be considered safer
investments.

https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/statistics-definitions/kurtosis-leptokurtic-platykurtic/
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
Kurtosis and Skewness
Introduction to skewness

Left Normal: Right


(negative) Mean (positive)
skew: equals skew:
Mean mode Mean
is less than exceeds
mode mode

Value of function

Random variable
https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
https://whatis.techtarget.com/definition/skewness
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
Kurtosis and Skewness
Mean and median

Skewed and Symetric Data


1 2 3 4 5 6
A B
8
6
4
Frequency 2
0
C
Data set A: Mean = 2.64 Median = 2.00
Data set B: Mean = 4.36 Median = 5.00
Data set C: Mean = 3.00 Median = 3.00

1 2 3 4 5 6

von Hippel, Paul T. (2005). "Mean, Median, and Skew: Correcting a Textbook Rule". Journal of Statistics Education. 13 (2).
https://en.wikipedia.org/wiki/Skewness
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Kurtosis and Skewness
Why is skew important?
0,4
0,3

Positively Skewed Residuals Normal Distribution


Log
0,2

34,1% 34,1% Transformation


0,1

2,1% 2,1%
0,1% 13,6% 13,6% 0,1%
0,0

–30 –20 –10 μ 10 20 30


68.2%
Negatively Skewed Residuals Normal Distribution
95.4% Exponential
Transformation
99.7%

Normal Distribution Data Transformations

https://www.sheffield.ac.uk/polopoly_fs/1.579181!/file/stcp-marshallsamuels-NormalityS.pdf
https://www.quora.com/How-does-skewness-impact-regression-model
https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
https://www.linkedin.com/pulse/question-does-skewness-variable-impact-predictive-data-mosaddar for more information
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8
Kurtosis and Skewness
Summary

▪ Kurtosis is a measure of the "tailedness" of the probability distribution:


− Data sets with high kurtosis tend to have heavy tails, or outliers (“leptokurtic”).
− Data sets with low kurtosis tend to have light tails, or lack of outliers
(“platykurtic”).
− Distributions with zero excess kurtosis are called “mesokurtic”
(normal distribution family).
▪ Skewness is a measure of the asymmetry of a probability distribution.
− A distribution is symmetric if it looks the same to the left and right of the center
point.
− If most of the data is on the left side of the histogram but a few larger values are
on the right, the data is said to be skewed to the right (positive skew).
− If most of the data is on the right, with a few smaller values showing up on the left
side of the histogram, the data is skewed to the left (negative skew).
− If the distribution is symmetric, then the mean is equal to the median and the
distribution has zero skewness. If the distribution is both symmetric and
unimodal, then the mean = median = mode.
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9
Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 5: Probability Distributions
Unit 4: Using the Normal Distribution to Calculate
Probability
Using the Normal Distribution to Calculate Probability
Normal distribution recap

Normal Curve Normal Curve


Smaller Standard Deviation Larger Standard Deviation

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Using the Normal Distribution to Calculate Probability
Probability and the normal distribution recap

a
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Using the Normal Distribution to Calculate Probability
Empirical rule recap

50% 50%

Total area under the The z values are the


curve = 100% number of std deviations
from the mean

34% 34%

2.35% 2.35%
0.15% 13.5% 13.5% 0.15%
μ-3σ μ -2σ μ -σ μ μ+σ μ+2σ μ+3σ
68%

95%

99.7%

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Using the Normal Distribution to Calculate Probability
Empirical rule example

Mean = 1.5

Question
▪ 95% of students at school are between
1.2m and 1.8m tall.
▪ Assuming this data is normally
distributed, calculate the mean and
standard deviation.

95%

2 SD 2 SD

http://davidmlane.com/hyperstat/z_table.html 4 SD

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


Using the Normal Distribution to Calculate Probability
Find probabilities

▪ How can you use this theory in practice?


▪ To find the probability associated with a normal random variable,
use a graphing calculator, an online normal distribution calculator,
or a normal distribution table.
▪ There are lots of normal distribution calculators available online.

▪ Here are some examples for you:


https://www.mathportal.org/calculators/statistics-calculator/normal-distribution-calculator.php
https://stattrek.com/online-calculator/normal.aspx
https://www.hackmath.net/en/calculator/normal-distribution
http://davidmlane.com/hyperstat/z_table.html

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Using the Normal Distribution to Calculate Probability
Example 1

Question
▪ On average, a light bulb lasts 300
days with a standard deviation of
50 days.
▪ Assuming that bulb life is normally
distributed, what is the probability
that the light bulb will last at most
365 days?

https://www.hackmath.net/en/calculator/normal-distribution
https://www.hackmath.net/en/calculator/normal-distribution?mean=300&sd=50&above=&area=below&below=365&ll=&ul=&outsideLL=
&outsideUL=&draw=Calculate
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Using the Normal Distribution to Calculate Probability
Example 2

Question
▪ Scores on an IQ test are normally
distributed.
▪ If the test has a mean of 110 and a
standard deviation of 20, what is the
probability that a person who takes
the test will score between 90 and
120?

https://www.hackmath.net/en/calculator/normal-distribution?mean=110&sd=20&above=&below=&area=between&ll=90&ul=120&outsideLL=
&outsideUL=&draw=Calculate
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8
Using the Normal Distribution to Calculate Probability
Example 3

Question
▪ A student achieved a score
of 900 in an exam.
▪ The mean test score was
825 with a standard
deviation of 100.
▪ Assuming that test scores
are normally distributed,
what proportion of students
achieved a higher score
than 900?

https://www.hackmath.net/en/calculator/normal-distribution?mean=825&sd=100&area=above&above=900&below=&ll=&ul=&outsideLL=
&outsideUL=&draw=Calculate
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9
Using the Normal Distribution to Calculate Probability
Summary

▪ The normal distribution refers to a family of continuous probability


distributions.
▪ The area under the normal distribution curve can be used to calculate
probabilities for a normally distributed random variable.
▪ There are lots of normal distribution calculators available online.
Given the mean and standard deviation, the calculator can be used to
calculate the area under the normal curve (the probability):
− less than a value
− greater than a value
− between values
− outside two values

https://stattrek.com/probability-distributions/normal.aspx
https://www.mathsisfun.com/data/standard-normal-distribution.html
https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10
Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.
Week 5: Probability Distributions
Unit 5: Hypothesis Testing
Hypothesis Testing
Introduction

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Hypothesis Testing
Null and alternative hypotheses

Determine whether a coin was fairly


balanced:
▪ A null hypothesis H0 might be that half the
flips would result in Heads and half in Tails.
▪ The alternative hypothesis Ha might be that
the number of Heads and Tails would be
very different.

Ho: P = 0.5
H1: P ≠ 0.5
α < 0.05

https://stattrek.com/hypothesis-test/hypothesis-testing.aspx
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3
Hypothesis Testing
Testing

Hypothesis Testing

1. State the 2. Formulate an 3. Analyze 4. Interpret


hypotheses analysis plan sample data results

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Hypothesis Testing
Decision rules

Region of Acceptance

Rejection Acceptance Rejection


Region Region Region

Reject H0 if test statistic is


in these critical regions
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
Hypothesis Testing
One-tailed and two-tailed tests

Two-Tailed Test
Non-Rejection Region
Reject Reject
Hypothesis Hypothesis

Left-Sided (One-Tailed) Test Right-Sided (One-Tailed) Test


Non-Rejection Region Non-Rejection Region
Reject Reject
Hypothesis Critical Values Hypothesis
α/2 = 0.5%

Critical Value α = 1% Critical Value α = 1%

http://www.stat.yale.edu/Courses/1997-98/101/sigtest.htm
https://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-significance-levels-alpha-and-p-values-in-statistics
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6
Hypothesis Testing
Decision errors

Truth
H0 is True H0 is False

Type II Error
H0 Not Rejected Correct Decision
β
Statistician’s opinion
(based on the sample
data and decision rule)
Type I Error
H0 Rejected Correct Decision
α

https://en.wikipedia.org/wiki/Power_(statistics)
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Hypothesis Testing
Summary

▪ “Hypothesis testing” refers to the formal procedures used


by statisticians to accept or reject statistical hypotheses.
▪ There are two types of statistical hypotheses:
1. Null hypothesis (Ho) is usually the hypothesis that the
sample observations result purely from chance.
2. Alternative hypothesis (H1 or Ha) is the hypothesis that
the sample observations are influenced by some non-
random cause.
▪ An analysis plan includes decision rules for rejecting the null
hypothesis. Statisticians describe these decision rules in two
ways – with reference to a P-value or with reference to a
region of acceptance.
▪ Two types of errors can result from a hypothesis test.
− A Type I error occurs when the researcher rejects a null
hypothesis when it is true.
− A Type II error occurs when the researcher fails to reject a
null hypothesis that is false.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

Potrebbero piacerti anche