Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Inferential Analysis
By
Amir Iqbal
Mean
o The mean for quantitative data is obtained by dividing
the sum of all values by the number of values in the data
set.
x
x
x
n
362
45.25 years
8
Cont
o Thus, the mean age of all eight
employees of this company is 45.25
years, or 45 years and 3 months.
Median
o The median is the value of the middle term in a data set that
has been ranked in increasing order.
The calculation of the median consists of the following two
steps:
1. Sort/Arrange the data set in increasing order
2. Find the middle term in a data set with n values. The value of this term is the
median.
o The following data give the weight lost (in pounds) by a sample
of five members of a health club at the end of two months of
membership:
10 5 19 8 3
Find the median.
Cont
o First, we rank the given data in increasing order as follows:
3 5 8 10 19
o Therefore, the median is 3 5 8 10 19
The median weight loss for this sample of five members of this
health club is 8 pounds.
o The median gives the center with half the data values to the left of
the median and half to the right of the median.
o The advantage of using the median as a measure of central tendency
is that it is less influenced by outliers & skewness.
o Consequently, the median is preferred over the mean as a measure of
central tendency for data sets that contain outliers and/or skewness.
Mode
The mode is the value that occurs with the highest
frequency in a data set.
Range
Standard Deviation
Cont
x
deviation
82
95
67
92
82 84 = -2
95 84 = +11
67 84 = -17
92 84 = +8
(deviation) = 0
Calculation
standard deviation stdev = sqrt (sum squared deviations
divided by n-1)
Example :: sqrt[(4 + 121 + 289 + 64)/3]
sqrt(478/3) = sqrt(159.3) = 12.62
Inferential Statistics
Inferential statistics
Allow researchers to generalize to a population of
individuals based on information obtained from a
sample of those individuals
Assess whether the results obtained from a sample
are the same as those that would have been
calculated for the entire population
Sampling Distributions
A distribution of sample statistics
A distribution of mean scores
A distribution of the differences between two mean scores
A distribution of the ratio of two variances
Rejection of the
null hypothesis
The difference between
groups is so large it can be
attributed to something
other than chance (e.g.,
experimental treatment)
The relationship between
variables is so large it can
be attributed to something
other than chance (e.g., a
real relationship)
Tests of Significance
Statistical analyses to help decide whether to
accept or reject the null hypothesis
Alpha level
An established probability level which serves as
the criterion to determine whether to accept or
reject the null hypothesis
Common levels in education
.01
.05
.10
Incorrect decisions
Type I error - the null hypothesis is true and it is
rejected
Type II error - the null hypothesis is false and it is
accepted
Tests of Significance
Two types
Parametric
Nonparametric
Tests of Significance
Four assumptions of parametric tests
Tests of Significance
Assumptions of nonparametric tests
No assumptions about the shape of the
distribution of the dependent variable
Ordinal or categorical data
30
Parametric
Non-parametric
Assumed distribution
Normal
Any
Assumed variance
Homogeneous
Any
Typical data
Ratio or Interval
Ordinal or Nominal
Independent
Any
Mean
Median
Benefits
31
Choosing
Correlation test
Pearson
Spearman
Independent-measures t-test
Mann-Whitney test
One-way, independent-measures
ANOVA
Kruskal-Wallis test
Matched-pair t-test
Wilcoxon test
Friedman's test
32
PARAMETRIC TESTS
STATISTICAL SIGNIFICANCE
As most analysis is carried out on data from only a sample
of the population,
How likely is it that the results indicate the situation for
the whole population.
Are the results simply occasioned by chance or are they truly
representative,
i.e. are they statistically significant?
ANALYSIS OF VARIANCE
Another common requirement is to look for differences
between values obtained under two or more different
conditions,
e.g. a group before and after a training course, or three
groups after different training courses.
There are a range of tests that can be applied to discern the
variance depending on the number of groups.
34
Z Test
Uses Z distribution
Based on assumption of normal distribution
Sample size greater then 30
SD of the population known
Used for parameters
F Test
Based on assumption of normal distribution
Uses F distribution
Used to compare the variances of two independent samples
Used in ANOVA to determine the model strength and significance
T Test
Also known as students t-test
Uses T distribution
Applicable even if Sample size less them 30
SD of sample is unknown
Used for hypothesis testing
35
Compare Means
1. Mean :
What it does: The Means procedure calculates
subgroup means and related uni-variate statistics
for dependent variables within categories of one
or more independent variables.
Optionally, you can obtain a one-way analysis of
variance
36
Non-Parametric Tests
Nonparametric tests may be, and often are, more
powerful in detecting population differences
when certain assumptions are not satisfied.
Non-parametric statistical tests are used when:
The sample size is very small;
Few assumptions can be made about the data; x data
are rank ordered or nominal;
Samples are taken from several different populations.
41
Mann-Whitney test:
Used to compare differences between two independent
groups when the dependent variable is either ordinal or
interval/ratio, but not normally distributed.
The attitudes towards pay discrimination, where attitudes
are measured on an ordinal scale, differ based on gender
Your dependent variable would be "attitudes towards pay
discrimination" and your independent variable would be
"gender", which has two groups: "male" and "female".
42
43
THANKS