Sei sulla pagina 1di 39

 Collectionof methods for

planning experiments, obtaining


data, and then organizing,
summarizing, presenting,
analyzing, interpreting, and
drawing conclusions.
 Descriptive Statistics
› Collection, organization, summarization,
and presentation of data.
 Inferential Statistics
› Generalizing from samples to populations
using probabilities. Performing hypothesis
testing, determining relationships
between variables, and making
predictions.
 Population - All subjects possessing a common
characteristic that is being studied.
 Sample - A subgroup or subset of the
population.
 Parameter - Characteristic or measure
obtained from a population.
 Statistic (not to be confused with
Statistics)Characteristic or measure obtained
from a sample.
 Variable - Characteristic or attribute that can
assume different values
 Qualitative Variables
Variables which assume non-numerical values.
 Quantitative Variables
Variables which assume numerical values.
 Discrete Variables
Variables which assume a finite or countable
number of possible values. Usually obtained by
counting.
 Continuous Variables
Variables which assume an infinite number of
possible values. Usually obtained by measurement.
1. Nominal is the lowest level. Only
names are meaningful here.
2. Ordinal adds an order to the
names.
3. Interval adds meaningful
differences
4. Ratio adds a zero so that ratios are
meaningful.
1. Random Sampling
Sampling in which the data is collected
using chance methods or random numbers.
2. Systematic Sampling
Sampling in which data is obtained by
selecting every kth object.
3. Convenience Sampling
Sampling in which data is which is readily
available is used.
4. Stratified Sampling
Sampling in which the population is
divided into groups (called strata) according
to some characteristic. Each of these strata is
then sampled using one of the other sampling
techniques.
5. Cluster Sampling
Sampling in which the population is
divided into groups (usually geographically).
Some of these groups are randomly selected,
and then all of the elements in those groups
are selected.
1. MEAN - sum of all observations in the data
set divided by the number of observations
that included in the sum.
2. MEDIAN - middle value in a ranked list of
observations
3. MODE - most frequent data value. There
may be no mode if no one value appears
more than any other. There may also be two
modes (bimodal), three modes (trimodal), or
more than three modes (multi-modal).
1. PERCENTILE – divides data into 100
regions
2. DECILE - divides data into 10 regions
3. QUARTILE - divides data into 4 regions
4. Z-score - obtained by subtracting the
mean and dividing the difference by
the standard deviation.
 Rank the data
 Find k% (k /100) of the sample size, n.
 If this is an integer, add 0.5. If it isn't an
integer round up.
 Find the number in this position. If your
depth ends in 0.5, then take the
midpoint between the two numbers.
1. minimum value
2. lower hinge/Q1
3. Median
4. upper hinge/Q3
5. maximum value
 A graphical representation of the five
number summary.
 RANGE - difference between the
maximum and minimum values
 VARIANCE - accounts for the average
squared deviation of each observation
from the mean.
 STANDARD DEVIATION - is the positive
square root of the variance.
 INTERQUARTILE RANGE (IQR) – Q3-Q1
Used to compare variability between or
among different data sets, that is, the
data sets are for different variables or
same variables but measured in
different unit of measurement
 The empirical rule is only valid for bell-
shaped (normal) distributions. The following
statements are true.
 Approximately 68% of the data values fall
within one standard deviation of the mean.
 Approximately 95% of the data values fall
within two standard deviations of the mean.
 Approximately 99.7% of the data values fall
within three standard deviations of the
mean.
 Central Limit Theorem
which stats as the sample size increases, the
sampling distribution of the sample means will become
approximately normally distributed.
 Sampling Distribution of the Sample Means
Distribution obtained by using the means computed
from random samples of a specific size.
 Sampling Error
Difference which occurs between the sample statistic
and the population parameter due to the fact that the
sample isn't a perfect representation of the population.
It is equal to the standard deviation of the population
divided by the square root of the sample size.
A normal distribution in which the
mean is 0 and the standard
deviation is 1. It is denoted by z.
 Z-score (also known as z-value)
› A standardized score in which the
mean is zero and the standard
deviation is 1. The Z score is used to
represent the standard normal
distribution.
 Factorial
A positive integer factorial is the product of
each natural number up to and including the
integer.
 Permutation
An arrangement of objects in a specific order.
 Combination
A selection of objects without regard to order.
 Tree Diagram
A graphical device used to list all possibilities of
a sequence of events in a systematic way.
If n is a positive integer, then
n! = n (n-1) (n-2) ... (3)(2)(1) n!
= n (n-1)!

A special case is 0!
0! = 1
 Permutations using all the objects
› A permutation of n objects, arranged into
one group of size n, without repetition, and
order being important is:
nPn
= P(n,n) = n!
Example: Find all permutations of the
letters "ABC"
ABC ACB BAC BCA CAB CBA = 6 = 3!
 Permutations of some of the objects
› A permutation of n objects, arranged in
groups of size r, without repetition, and order
being important is:
𝑛!
n Pr = P(n,r) =
(𝑛−𝑟)!
 Example: Find all two-letter permutations
of the letters "ABC"
3!
AB AC BA BC CA CB = 6 =
(3−2)!
 A combination of n objects, arranged in
groups of size r, without repetition, and
order being important is:
𝑛!
nCr = C(n,r) =
(𝑛−𝑟)! 𝑟!
Another way to write a combination of n
things, r at a time is using the binomial
𝑛
notation:
𝑟
Tree diagrams are a to make sure that
graphical way of you have them all
listing all the possible listed.
outcomes. The
outcomes are listed
in an orderly fashion,
so listing all of the
possible outcomes is
easier than just trying
 A variable whose values are determined
by chance.
 EXAMPLE: Tossing a coin three times, let X
be the number of heads in the outcome
A probability function is a function
which assigns probabilities to the
values of a random variable.
1. All the probabilities must be
between 0 and 1 inclusive
2. The sum of the probabilities of the
outcomes must be 1.
A listing of all the values the random
variable can assume with their
corresponding probabilities make a
probability distribution.
From previous example, tossing a coin 3
times
Ramdon Variable X P(x)
3 1/8
2 3/8
1 3/8
0 1/8
TOTAL 8/8 = 1
X P(x) x[p(x)] 𝒙𝟐 𝒙𝟐 𝐩(𝐱)
3 1/8 3/8 9 9/8
2 3/8 6/8 4 12/8
1 3/8 3/8 1 3/8
0 1/8 0 0 0
TOTAL 1 12/8 24/8

12 24 12 12
From the above, 𝜇 = = 1.5 and 𝜎 2 = − = = 1.5
8 8 8 8
 Null Hypothesis ( H0 )Statement of zero or no
change. If the original claim includes
equality (<=, =, or >=), it is the null
hypothesis. The null
hypothesis always includes the equal sign.
The decision is based on the null hypothesis.
 Alternative Hypothesis ( H1 or Ha )Statement
which is true if the null hypothesis is false.
The type of test (left, right, or two-tail) is
based on the alternative hypothesis.
1. Type I error - Rejecting the null hypothesis
when it is true (saying false when true).
Usually the more serious error.
2. Type II error - Failing to reject the null
hypothesis when it is false (saying true when
false).

Alpha - probability of committing a Type I error.


Beta - probability of committing a Type II error.
 Test statistic -Sample statistic used to decide whether
to reject or fail to reject the null hypothesis.
 Critical region - Set of all values which would cause us
to reject H0
 Critical value(s) - The value(s) which separate the
critical region from the non-critical region. The critical
values are determined independently of the sample
statistics.
 Significance level (alpha) - The probability of rejecting
the null hypothesis when it is true. alpha = 0.05 and
alpha = 0.01 are common. If no level of significance is
given, use alpha = 0.05. The level of significance is the
complement of the level of confidence in estimation.
 Decision - A statement based upon the null
hypothesis. It is either "reject the null
hypothesis" or "fail to reject the null
hypothesis". We will never accept the null
hypothesis.
 Conclusion - A statement which indicates
the level of evidence (sufficient or
insufficient), at what level of significance,
and whether the original claim is rejected
(null) or supported (alternative).
1. Population Standard Deviation Known
then the population mean has a normal
distribution, and you will be using the z-score
formula for sample means.
2. Population Standard Deviation Unknown
then the population mean has a student's t-
distribution, and you will be using the t-score
formula for sample means.
 Left Tailed Test
H1: parameter < value
Notice the inequality points to the left
Decision Rule: Reject H0 if t.s. < c.v.
 Right Tailed Test
H1: parameter > value
Notice the inequality points to the right
Decision Rule: Reject H0 if t.s. > c.v.
 Two Tailed Test
H1: parameter not equal value
Another way to write not equal is < or >
Notice the inequality points to both sides
Decision Rule:
Reject H0 if t.s. < c.v. (left) or t.s. > c.v. (right)

Potrebbero piacerti anche