Inferential Statistic

Lecture 3
Inferential Statistic
Muhammad Farhan Tabassum

Department of Sports Sciences &Physical Education
Faculty of Allied Health Science
University of Lahore
Outline
 FUNDAMENTAL CONCEPTS
 CHI-SQUARE TEST OF INDEPENDENCE
 PEARSON CORRELATION
 T TEST
 ONE SAMPLE T TEST
 PAIRED SAMPLES T TEST
 INDEPENDENT SAMPLES T TEST
 ANOVA
 ONE-WAY ANOVA
Independent and Dependent
Variables
 Independent variables:
manipulated by the experimenter
under the control of the experimenter
 Dependent variables:
not under the experimenter’s control
usually the outcome to be measured
Typically, we are interested in measuring the

effects of independent variables on dependent
variables
Continuous and Discontinuous Scales
 Discontinuous (Discrete)
Measurement:
 variables for which values can Discontinuous:
only be whole numbers
(integers) 1, 2, 3, -1, -2, -3
 Continuous Measurement:
variables that can assume any Continuous:
value (real numbers)
1.5, 2.25, 3.775,
Hint: If it can be broken down into 4.0135, 5 ½, 6 ¾
decimal points, it is continuous!!!
Scales of Measurement
 Nominal
 Ordinal } Qualitative
 Interval
 Ratio
} Quantitative
If your measurement scale is RATIO

nominal or ordinal then you use
non-parametric statistics INTERVAL
If you are using interval or ratio ORDINAL

scales you use parametric
statistics. NOMINAL
Nominal Scale (Discrete)
 Simplest scale of measurement

 Variables which have no numerical value
 Variables which have categories
 Count number in each category, calculate
percentage
 Examples:
 Gender
 Race
 Marital status
 Alive or dead
1st 2nd 3rd
Ordinal Scale
 Variables are in categories, but with an
underlying order to their values
 Rank-order categories from highest to lowest
 Intervals may not be equal
 Count number in each category, calculate
percentage
 Examples:
 Cancer stages
 1st, 2nd, 3rd places finishes in a horse race
 Emotions ratings
 Level of education
Interval Scale
 Quantitative data
 Can add & subtract values

 Assumption of “equidistance” applies to data
or numbers gathered
 The interval differences are meaningful
 Example:
 Celsius and Fahrenheit temperature scales,
 IQ scores
 GRE scores
Ratio Scale (continuous)
 a measurement scale that has equal units of
measurement and a rational zero point for the
scale
 ratio measures contain the most precise
information about each observation that is
made
 Examples:
 Body weight
 Blood pressure
 time as a unit of measure
 weight and height as units of measure
Data types
Data Variables
Scale Categorical:
Measurements appear as categories
Numerical Tick boxes on questionnaires
count data
Data types
Variables
Scale Categorical
Continuous Nominal:
Discrete: Ordinal:
Measurements
Counts/ integers obvious order no meaningful
takes any value order
Data types
What data types relate to following questions?
Q1: What is your favourite subject? Nominal
Maths English Science Art French
Binary/ Nominal
Q2: Gender: Male Female
Q3: I consider myself to be good at mathematics:
Strongly Disagree Not Sure Agree Strongly Ordinal
Disagree Agree
Q4: Score in a recent mock GCSE maths exam:

Score between 0% and 100% Scale
CHI-SQUARE TEST OF
INDEPENDENCE
The Chi-Square Test of Independence determines
whether there is an association between
categorical variables (i.e., whether the variables
are independent or related).
It is a nonparametric test.
This test is also known as: Chi-Square Test of
Association. This test utilizes a contingency table
to analyze the data.
CHI-SQUARE TEST OF
INDEPENDENCE
A contingency table is an arrangement in which
data is classified according to two categorical
variables. The categories for one variable appear
in the rows, and the categories for the other
variable appear in columns. Each variable must
have two or more categories. Each cell reflects
the total count of cases for a specific pair of
categories.
CHI-SQUARE TEST OF
INDEPENDENCE
CHI-SQUARE TEST OF
INDEPENDENCE
COMMON USES
Statistical independence or association between two or

more categorical variables.
The Chi-Square Test of Independence can only compare

categorical variables.
It cannot make comparisons between continuous variables

or between categorical and continuous variables.
Additionally, the Chi-Square Test of Independence only

assesses associations between categorical variables.
CHI-SQUARE TEST OF
INDEPENDENCE
DATA REQUIREMENTS
 Two categorical variables.
 Two or more categories (groups) for each variable.

 Independence of observations.
 There is no relationship between the subjects in each

group.
 Relatively large sample size.

 Expected frequencies for each cell are at least 1.
CHI-SQUARE TEST OF
INDEPENDENCE
HYPOTHESES
The null hypothesis (H0) and alternative hypothesis (H1)

of the Chi-Square Test of Independence can be
expressed in two different but equivalent ways:
H0: "[Variable 1] is independent of [Variable 2]"

H1: "[Variable 1] is not independent of [Variable 2]"
OR
H0: "[Variable 1] is not associated with [Variable 2]"

H1: "[Variable 1] is associated with [Variable 2]"
CHI-SQUARE TEST OF
INDEPENDENCE
TEST STATISTIC
The test statistic for the Chi-Square Test of
Independence is denoted Χ2, and is computed as:
where
oij is the observed cell count in the ith row and jth
column of the table
eij is the expected cell count in the ith row and jth
column of the table, computed as
CHI-SQUARE TEST OF
INDEPENDENCE
The quantity (oij - eij) is sometimes referred to as

the residual of cell (i, j).
The calculated Χ2 value is then compared to the
critical value from the Χ2 distribution table with
degrees of freedom df = (R - 1)(C - 1) and chosen
confidence level.
If the calculated Χ2 value > critical Χ2 value, then
we reject the null hypothesis.
CHI-SQUARE TEST OF
INDEPENDENCE
Problem Statement
In the sample dataset, respondents were asked their

gender and whether or not they were a cigarette smoker.
There were three answer choices: Nonsmoker, Past
smoker, and Current smoker. Suppose we want to test for
an association between smoking behavior (nonsmoker,
current smoker, or past smoker) and gender (male or
female) using a Chi-Square Test of Independence (we'll
use α = 0.05).
CHI-SQUARE TEST OF
INDEPENDENCE
DATA SET-UP
There are two different ways in which your data may

be set up initially. The format of the data will
determine how to proceed with running the Chi-
Square Test of Independence. At minimum, your
data should include two categorical variables
(represented in columns) that will be used in the
analysis. The categorical variables must include at
least two groups.
CHI-SQUARE TEST OF
INDEPENDENCE
Example of a dataset structure where each row
represents a case or subject. Screenshot shows a
Data View window with cases 1-5 and 430-435 from
the sample dataset, and columns ids, Smoking and
Gender.
Cases represent subjects, and each subject
appears once in the dataset. That is, each row
represents an observation from a unique subject.
The dataset contains at least two nominal
categorical variables (string or numeric). The
categorical variables used in the test must have two
or more categories.
CHI-SQUARE TEST OF
INDEPENDENCE
CHI-SQUARE TEST OF
INDEPENDENCE
Running the Test
Open the Crosstabs dialog (Analyze > Descriptive

Statistics > Crosstabs).
Select Smoking as the row variable, and Gender as the
column variable.
Click Statistics. Check Chi-square, then
click Continue.
(Optional) Check the box for Display clustered bar
charts.
Click OK
CHI-SQUARE TEST OF
INDEPENDENCE
KEY RESULT
The value of the test statistic is 3.171.
The footnote for this statistic relates to the expected cell

count assumption no cells had an expected count less than
5, so this assumption was met.
Because the test statistic is based on a 3x2 cross tabulation

table, the degrees of freedom (df) for the test statistic is
df=(R−1)∗(C−1)=(3−1)∗(2−1)=2∗1=2
The corresponding p-value of the test statistic is p = 0.205.

CHI-SQUARE TEST OF
INDEPENDENCE
Decision and Conclusions
Since the p-value is greater than our chosen significance

level (α = 0.05), we do not reject the null hypothesis.
Rather, we conclude that there is not enough evidence to

suggest an association between gender and smoking.
Based on the results, we can state the following:
No association was found between gender and smoking

behavior (Χ2 > = 3.171, p = 0.205).
CHI-SQUARE TEST OF
INDEPENDENCE
If the p-value is less than our chosen significance level α

= 0.05, we can reject the null hypothesis, and conclude
that there is an association between both the variables.
There was a significant association between both the

variables (Χ2 > = 138.9, p < .001).
PEARSON CORRELATION
The bivariate Pearson Correlation produces a sample

correlation coefficient, r, which measures the strength and
direction of linear relationships between pairs of
continuous variables.
By extension, the Pearson Correlation evaluates whether

there is statistical evidence for a linear relationship among
the same pairs of variables in the population, represented
by a population correlation coefficient, ρ (“rho”).
The Pearson Correlation is a parametric measure.

PEARSON CORRELATION
COMMON USES
The bivariate Pearson Correlation is commonly used to
measure the following:
 Correlations among pairs of variables
 Correlations within and between sets of variables
The bivariate Pearson correlation indicates the following:

 Whether a statistically significant linear relationship
exists between two continuous variables
 The strength of a linear relationship (i.e., how close the
relationship is to being a perfectly straight line)
 The direction of a linear relationship (increasing or
decreasing)
PEARSON CORRELATION
DATA REQUIREMENTS
 Two or more continuous variables (i.e., interval or ratio

level)
 Cases that have values on both variables
 Linear relationship between the variables
 There is no relationship between the values of variables
between cases. This means that: the values for all
variables across cases are unrelated
 Each pair of variables is bivariately normally distributed
 Random sample of data from the population
 No outliers
PEARSON CORRELATION
HYPOTHESES
The null hypothesis (H0) and alternative

hypothesis (H1) of the significance test for
correlation can be expressed in the following
ways, Two-tailed significance test:
H0: ρ = 0 ("the population correlation coefficient

is 0; there is no association")
H1: ρ ≠ 0 ("the population correlation coefficient

is not 0; a nonzero correlation could exist")
PEARSON CORRELATION
HYPOTHESES
One-tailed significance test:
H0: ρ = 0 ("the population correlation coefficient
is 0; there is no association")
H1: ρ > 0 ("the population correlation coefficient
is greater than 0; a positive correlation could
exist")
OR
H1: ρ < 0 ("the population correlation coefficient
is less than 0; a negative correlation could
exist")
where ρ is the population correlation coefficient.
PEARSON CORRELATION
TEST STATISTIC
The sample correlation coefficient between two
variables x and y is denoted r or rxy, and can be
computed as:
where cov(x, y) is the sample covariance of x and

y;
var(x) is the sample variance of x;
and var(y) is the sample variance of y.
PEARSON CORRELATION
TEST STATISTIC
Correlation can take on any value in the
range [-1, 1].
The sign of the correlation coefficient indicates the

direction of the relationship, while the magnitude
of the correlation (how close it is to -1 or +1)
indicates the strength of the relationship.
PEARSON CORRELATION
-1 : perfectly negative linear relationship
0 : no relationship
+1 : perfectly positive linear relationship
a negative correlation corresponds to a

decreasing relationship, while and a positive
correlation corresponds to an increasing
relationship.
PEARSON CORRELATION
PEARSON CORRELATION
PEARSON CORRELATION
Problem Statement
Perhaps you would like to test whether there is a

statistically significant linear relationship between
two continuous variables, weight and height (and by
extension, infer whether the association is
significant in the population). You can use a
bivariate Pearson Correlation to test whether there
is a statistically significant linear relationship
between height and weight, and to determine the
strength and direction of the association.
PEARSON CORRELATION
To run the bivariate Pearson Correlation,

click Analyze > Correlate > Bivariate. Select the
variables Height and Weight and move them to the
Variables box.
In the Correlation Coefficients area, select Pearson.
In the Test of Significance area, select your desired
significance test, two-tailed or one-tailed.
We will select a two-tailed significance test in this
example. Check the box next to Flag significant
correlations.
Click OK to run the bivariate Pearson Correlation.
Output for the analysis will display in the Output Viewer.
PEARSON CORRELATION
PEARSON CORRELATION
KEY RESULT
Correlation of Height with itself (r=1), and the number

of non-missing observations for height (n=408).
Correlation of height and weight (r=0.513), based on
n=354 observations with pairwise non-missing
values.
Correlation of height and weight (r=0.513), based on
n=354 observations with pairwise non-missing
values.
Correlation of weight with itself (r=1), and the number
of non-missing observations for weight (n=376).
PEARSON CORRELATION
KEY RESULT
The correlations in the main diagonal (cells A and D)

are all equal to 1. This is because a variable is
always perfectly correlated with itself. Notice,
however, that the sample sizes are different in cell A
(n=408) versus cell D (n=376). This is because of
missing data -- there are more missing observations
for variable Weight than there are for variable
Height.
PEARSON CORRELATION
Weight and height have a statistically significant

linear relationship (p < .001).
The direction of the relationship is positive (i.e.,
height and weight are positively correlated),
meaning that these variables tend to increase
together (i.e., greater height is associated with
greater weight).
The magnitude, or strength, of the association is
moderate.
ONE SAMPLE T-TEST
ONE SAMPLE T TEST

The One Sample t Test determines whether the
sample mean is statistically different from a known
or hypothesized population mean.
The One Sample t Test is a parametric test.
This test is also known as: Single Sample t Test
The variable used in this test is known as: Test
variable
In a One Sample t Test, the test variable is
compared against a "test value", which is a known
or hypothesized value of the mean in the
population.
ONE SAMPLE T-TEST
COMMON USES
Statistical difference between a sample mean and a known
or hypothesized value of the mean in the population.
Statistical difference between the sample mean and the

sample midpoint of the test variable.
Statistical difference between the sample mean of the test

variable and chance.
Statistical difference between a change score and zero.
If the mean change score is not significantly different from

zero, no significant change occurred.
ONE SAMPLE T-TEST
DATA REQUIREMENTS
 Test variable that is continuous (i.e., interval or

ratio level)
 Scores on the test variable are independent

(i.e., independence of observations)
 There is no relationship between scores on the

test variable
ONE SAMPLE T-TEST
DATA REQUIREMENTS
 Violation of this assumption will yield an

inaccurate p value
 Random sample of data from the population
 Normal distribution (approximately) of the

sample and population on the test variable
 No outliers
ONE SAMPLE T-TEST
HYPOTHESES
The null hypothesis (H0) and (two-tailed)

alternative hypothesis (H1) of the one sample T
test can be expressed as:
H0: µ = x ("the sample mean is equal to the
[proposed] population mean")
H1: µ ≠ x ("the sample mean is equal to the
[proposed] population mean")
where µ is a constant proposed for the population
mean and x is the sample mean.
ONE SAMPLE T-TEST
TEST STATISTIC
The test statistic for a One Sample t Test is
denoted t, which is calculated using the following
formula:
where
µ = Proposed constant for the population mean
x = Sample mean
n = Sample size
s = Sample standard deviation
sx = Estimated standard error of the mean
ONE SAMPLE T-TEST
TEST STATISTIC
The calculated t value is then compared to

the critical t value from the t distribution
table with degrees of freedom df = n - 1 and
chosen confidence level. If the
calculated t value > critical t value, then we
reject the null hypothesis.
ONE SAMPLE T-TEST
Problem Statement
According to the CDC, the mean height of adults ages 20
and older is about 66.5 inches (69.3 inches for males, 63.8
inches for females). Let's test if the mean height of our
sample data is significantly different than 66.5 inches using
a one-sample t test. The null and alternative hypotheses of
this test will be:
H0: 66.5 = xHeight ("the mean height of the sample is equal

to 66.5")
H1: 66.5 ≠ xHeight ("the mean height of the sample is not
equal to 66.5")
where 66.5 is the CDC's estimate of average height for
adults, and xHeight is the mean height of the sample.
ONE SAMPLE T-TEST
ONE SAMPLE T-TEST
Before the Test
In the sample data, we will use the variable Height, which

a continuous variable representing each respondent’s
height in inches. The heights exhibit a range of values
from 55.00 to 88.41
(Analyze > Descriptive Statistics > Descriptives).
Let's create a histogram of the data to get an idea of the
distribution, and to see if our hypothesized mean is near
our sample mean. Click Graphs > Legacy Dialogs >
Histogram. Move variable Height to the Variable box, then
click OK
ONE SAMPLE T-TEST
Before the Test
To add vertical reference lines at the mean (or another
location), double-click on the plot to open the Chart
Editor, then click Options > X Axis Reference Line.
In the Properties window, you can enter a specific
location on the x-axis for the vertical line, or you can
choose to have the reference line at the mean or
median of the sample data (using the sample data).
Click Apply to make sure your new line is added to
the chart. Here, we have added two reference lines:
one at the sample mean (the solid black line), and the
other at 66.5 (the dashed red line).
ONE SAMPLE T-TEST
Running the Test
To run the One Sample t Test,

click Analyze > Compare Means > One-Sample
T Test.
Move the variable Height to the Test
Variable(s) area. In the Test Value field, enter
66.5, which is the CDC's estimation of the average
height of adults over 20.
Click OK to run the One Sample t Test.
ONE SAMPLE T-TEST
Since p < 0.001, we reject the null hypothesis that the
sample mean is equal to the hypothesized population
mean and conclude that the mean height of the sample is
significantly different than the average height of the overall
adult population.
t = 5.810. Note that t is calculated by dividing the mean
difference (E) by the standard error mean.

• There is a significant difference in mean height between
the sample and the overall adult population (p < .001).
• The average height of the sample is about 1.5 inches
taller than the overall adult population average.
PAIRED SAMPLES T-TEST
 The Paired Samples t Test compares two means
that are from the same individual, object, or related
units.
 The two means typically represent two different
times (e.g., pre-test and post-test with an
intervention between the two time points) or two
different but related conditions or units (e.g., left and
right ears, twins).
 The purpose of the test is to determine whether
there is statistical evidence that the mean difference
between paired observations on a particular
outcome is significantly different from zero.
 The Paired Samples t Test is a parametric test.
 This test is also known as: Dependent t Test,

Paired t Test, Repeated Measures t Test
 The variable used in this test is known as:

Dependent variable, or test variable
(continuous), measured at two different times or
for two related conditions or units.
COMMON USES
The Paired Samples t Test is commonly used to

test the following:
Statistical difference between two time points
Statistical difference between two conditions
Statistical difference between two measurements
Statistical difference between a matched pair
Note: The Paired Samples t Test can only compare

the means for two (and only two) related (paired)
units on a continuous outcome that is normally
distributed.
COMMON USES
The Paired Samples t Test is not appropriate for

analyses involving the following:
1) unpaired data;
2) comparisons between more than two
units/groups;
3) a continuous outcome that is not normally
distributed; and
4) an ordinal/ranked outcome.
To compare unpaired means between two groups

on a continuous outcome that is normally
distributed, choose the Independent
Samples t Test.
To compare unpaired means between more than

two groups on a continuous outcome that is
normally distributed, choose ANOVA.
To compare paired means for continuous data that

are not normally distributed, choose the
nonparametric Wilcoxon Signed-Ranks Test.
To compare paired means for ranked data,

choose the nonparametric Wilcoxon Signed-
Ranks Test.
DATA REQUIREMENTS
Dependent variable that is continuous (i.e., interval or

ratio level)
Related samples/groups (i.e., dependent observations)
The subjects in each sample, or group, are the same.
This means that the subjects in the first group are also
in the second group.
Random sample of data from the population
Normal distribution (approximately) of the difference
between the paired values
No outliers in the difference between the two related
groups
HYPOTHESES
The hypotheses can be expressed in two different ways that

express the same idea and are mathematically equivalent:
H0: µ1 = µ2 ("the paired population means are equal")
H1: µ1 ≠ µ2 ("the paired population means are not equal")
OR
H0: µ1 - µ2 = 0 ("the difference between the paired population
means is equal to 0")
H1: µ1 - µ2 ≠ 0 ("the difference between the paired population
means is not 0")
where
µ1 is the population mean of variable 1, and
µ2 is the population mean of variable 2.
TEST STATISTIC
The test statistic for the Paired Samples t Test, denoted t,

follows the same formula as the one sample t test.
where
where
xdiff = Sample mean of the differences
n = Sample size (i.e., number of observations)
sdiff = Sample standard deviation of the differences
sx = Estimated standard error of the mean
If the calculated t value is greater than the critical t value,

then we reject the null hypothesis (and conclude that the
means are significantly different).
Problem Statement
The sample dataset has placement test scores (out

of 100 points) for four subject areas: English,
Reading, Math, and Writing. Suppose we are
particularly interested in the English and Math
sections, and want to determine whether English or
Math had higher test scores on average. We could
use a paired t test to test if there was a significant
difference in the average of the two tests.
Before the Test
Variable English has a high of 101.95 and a low of 59.83,
while variable Math has a high of 93.78 and a low of 35.32
(Analyze > Descriptive Statistics > Descriptives). The
mean English score is much higher than the mean Math
score (82.79 versus 65.47).
Let's create a comparative boxplot of these variables to help
visualize these numbers. Click Analyze > Descriptive
Statistics > Explore. Add English and Math to
the Dependents box; then, change the Display option
to Plots. We'll also need to tell SPSS to put these two
variables on the same chart. Click the Plots button, and in
the Boxplots area, change the selection to Dependents
Together. You can also uncheck Stem-and-leaf.
Click Continue. Then click OKto run the procedure.
We can see from the boxplot that the center of

the English scores is much higher than the center
of the Math scores, and that there is slightly more
spread in the Math scores than in the English
scores. Both variables appear to be
symmetrically distributed. It's quite possible that
the paired samples t-test could come back
significant.
Running the Test
Click Analyze > Compare Means > Paired-

Samples T Test.
Double-click on variable English to move it to

the Variable1 slot in the Paired Variables box.
Then double-click on variable Math to move it to
the Variable2 slot in the Paired Variables box.
Click OK.
From the results, we can say that:

English and Math scores were weakly and
positively correlated (r = 0.243, p < 0.001)
There was a significant average difference

between English and Math scores (t397 =
36.313, p < 0.001)
On average, English scores were 17 points
higher than Math scores
INDEPENDENT SAMPLES T- TEST
INDEPENDENT SAMPLES T TEST
The Independent Samples t Test compares the

means of two independent groups in order to
determine whether there is statistical evidence
that the associated population means are
significantly different.
The Independent Samples t Test is a parametric

test.
INDEPENDENT SAMPLES T TEST
This test is also known as:

Independent t Test,
Independent Measures t Test,
Independent Two-sample t Test,
Student t Test,
Two-Sample t Test,
Uncorrelated Scores t Test,
The variables used in this test are known as:

Dependent variable, or test variable
Independent variable, or grouping variable
COMMON USES
The Independent Samples t Test is commonly

used to test the following:
Statistical differences between the means of

two groups.
Statistical differences between the means of two

interventions
Statistical differences between the means of two

change scores
DATA REQUIREMENTS
Dependent variable that is continuous (i.e., interval or
ratio level)
Independent variable that is categorical (i.e., two or
more groups)
Independent samples/groups (i.e., independence of
observations)
There is no relationship between the subjects in each
sample
Normal distribution (approximately) of the dependent
variable for each group
Random sample of data from the population
No outliers
HYPOTHESES
The null hypothesis (H0) and alternative hypothesis (H1) of

the independent samples T test can be expressed as
H0: µ1 - µ2 = 0 ("the difference between the two population
means is equal to 0")
H1: µ1 - µ2 ≠ 0 ("the difference between the two population
means is not 0")
where µ1 and µ2 are the population means for group 1 and
group 2, respectively.
Notice that the second set of hypotheses can be derived
from the first set by simply subtracting µ2 from both sides of
the equation.
TEST STATISTIC
The test statistic for an Independent Samples t

Test is denoted t. There are actually two forms of
the test statistic for this test, depending on whether
or not equal variances are assumed. SPSS
produces both forms of the test, so both forms of
the test are described here. Note that the null and
alternative hypotheses are identical for both forms
of the test statistic.
TEST STATISTIC
Equal variances assumed

When the two independent samples are assumed
to be drawn from populations with identical
population variances (i.e., σ12 = σ22) , the test
statistic t is computed as:
Where
x1 = Mean of first sample
x2 = Mean of second sample
n1 = Sample size (i.e., number of observations) of first
sample
n2 = Sample size (i.e., number of observations) of second
sample
s1 = Standard deviation of first sample
s2 = Standard deviation of second sample
sp = Pooled standard deviation
The calculated t value is then compared to the critical t value
from the t distribution table with degrees of
freedom df = n1 + n2 - 2 and chosen confidence level. If the
calculated t value is greater than the critical t value, then we
reject the null hypothesis.
Problem Statement
In our sample dataset, students reported their

typical time to run a mile, and whether or not they
were an athlete. Suppose we want to know if the
average time to run a mile is different for athletes
versus non-athletes. This involves testing whether
the sample means for mile time among athletes
and non-athletes in your sample are statistically
different. You can use an Independent
Samples t Test to compare the mean mile time for
athletes and non-athletes.
The hypotheses for this example can be expressed

as:
H0: µnon-athlete - µathlete = 0 ("the difference of the
means is equal to zero")
H1: µnon-athlete - µathlete ≠ 0 ("the difference of the
means is not equal to zero")
where µathlete and µnon-athlete are the population
means for athletes and non-athletes, respectively.
Running the Test
To run the Independent Samples t Test:

Click Analyze > Compare Means >
Independent-Samples T Test.
Move the variable Athlete to the Grouping
Variable field, and move the
variable MileMinDur to the Test
Variable(s) area. Now Athlete is defined as the
independent variable and MileMinDur is defined
as the dependent variable.
Running the Test
Click Define Groups, which opens a new window. Use

specified values is selected by default. Since our
grouping variable is numerically coded (0 = "Non-
athlete", 1 = "Athlete"), type “0” in the first text box, and
“1” in the second text box. This indicates that we will
compare groups 0 and 1, which correspond to non-
athletes and athletes, respectively. Click Continue when
finished.
Click OK to run the Independent Samples t Test. Output
for the analysis will display in the Output Viewer window.
Since p < .0001 is less than our chosen significance

level α = 0.05, we can reject the null hypothesis, and
conclude that the that the mean mile time for athletes and
non-athletes is significantly different.

There was a significant difference in mean mile time
between non-athletes and athletes (t315.846 = 15.047, p <
.001).
The average mile time for athletes was 2 minutes and 14
seconds faster than the average mile time for non-athletes.
ONE-WAY ANOVA
ONE-WAY ANOVA
The One-Way ANOVA ("analysis of variance") compares

the means of two or more independent groups in order to
determine whether there is statistical evidence that the
associated population means are significantly different.
One-Way ANOVA is a parametric test.
The variables used in this test are known as:

Dependent variable
Two events are defined
Independent variable (also exclusive
to be mutually known as the grouping
variable, or factor)
if they cannot happen at
the same time.
This variable divides cases into two or more mutually
exclusive levels, or groups
ONE-WAY ANOVA
COMMON USES
The One-Way ANOVA is often used to analyze data from the
following types of studies:
Field studies, Experiments, Quasi-experiments
The One-Way ANOVA is commonly used to test the following:
Statistical differences among the means of two or more groups
Statistical differences among the means of two or more
interventions
Statistical differences among the means of two or more change
scores
Note: Both the One-Way ANOVA and the Independent
Samples t Test can compare the means for two groups. However,
only the One-Way ANOVA can compare the means across three
or more groups.
ONE-WAY ANOVA
DATA REQUIREMENTS
Your data must meet the following requirements:

 Dependent variable that is continuous (i.e.,
interval or ratio level)
 Independent variable that is categorical (i.e., two
or more groups)
 Cases that have values on both the dependent
and independent variables
 Independent samples/groups (i.e., independence
of observations)
ONE-WAY ANOVA
DATA REQUIREMENTS
 There is no relationship between the subjects in

each sample.
 Random sample of data from the population
 Normal distribution (approximately) of the
dependent variable for each group (i.e., for each
level of the factor)
 Homogeneity of variances (i.e., variances
approximately equal across groups)
 No outliers
ONE-WAY ANOVA
HYPOTHESES
The null and alternative hypotheses of one-way ANOVA

can be expressed as:
H0: µ1 = µ2 = µ3 = ... = µk ("all k population population

means are equal")
H1: At least one µi different ("at least one of the k
population means is not equal to the others")
where
µi is the population mean of the ith group (i = 1, 2, ..., k)
ONE-WAY ANOVA
TEST STATISTIC
The test statistic for a One-Way ANOVA is

denoted as F. For an independent variable with
k groups, the F statistic evaluates whether the
group means are significantly different.
Because the computation of the F statistic is
slightly more involved than computing the
paired or independent samples t test statistics,
it's extremely common for all of the F statistic
components to be depicted in a table like the
following:
where
SSR = the regression sum of squares
SSE = the error sum of squares
SST = the total sum of squares (SST = SSR + SSE)
dfr = the model degrees of freedom (equal to dfr = k - 1)
dfe = the error degrees of freedom (equal to dfe = n - k - 1)
k = the total number of groups (levels of the independent variable)
n = the total number of valid observations
dfT = the total degrees of freedom (equal to dfT = dfr + dfe = n - 1)
MSR = SSR/dfr = the regression mean square
MSE = SSE/dfe = the mean square error
Then the F statistic itself is computed as
ONE-WAY ANOVA
Problem Statement
In the sample dataset, the variable Sprint is the

respondent's time (in seconds) to sprint a given
distance, and Smoking is an indicator about whether or not
the respondent smokes (0 = Nonsmoker, 1 = Past smoker,
2 = Current smoker).
Let's use ANOVA to test if there is a statistically significant

difference in sprint time with respect to smoking status.
Sprint time will serve as the dependent variable, and
smoking status will act as the independent variable.
ONE-WAY ANOVA
Before the Test
Just like we did with the paired t test and the independent
samples t test, we'll want to look at descriptive statistics
and graphs to get picture of the data before we run any
inferential statistics.
The sprint times are a continuous measure of time to

sprint a given distance in seconds.
From the Descriptives procedure (Analyze > Descriptive

Statistics > Descriptives), and exhibit a range of 4.5 to
9.6 seconds, with a mean of 6.6 seconds (based on
n=374 valid cases).
ONE-WAY ANOVA
Running the Procedure
Click Analyze > Compare Means > One-Way ANOVA.

Add the variable Sprint to the Dependent List box, and add
the variable Smoking to the Factor box.
Click Options. Check the box for Means plot, then
click Continue.
Click OK when finished.
The Means plot is a visual representation of what we saw in

the Compare Means output. The points on the chart are the
average of each group. It's much easier to see from this
graph that the current smokers had the slowest mean sprint
time, while the nonsmokers had the fastest mean sprint time.
ONE-WAY ANOVA
Discussion and Conclusions
We conclude that the mean sprint time is

significantly different for at least one of the smoking
groups
(F2, 350 = 9.209, p < 0.001).
Note that the ANOVA alone does not tell us

specifically which means were different from one
another. To determine that, we would need to follow
up with multiple comparisons (or post-hoc) tests.
References
1. Alan C. Lacy
Measurement and Evaluation in Physical Education and Exercise Science
Pearson Education, 2014
2. Margaret J. Safrit, Terry M. Wood

Introduction to Measurement in Physical Education and Exercise Science
Pennsylvania State University, 2011
3. Ron Larson
Elementary statistics: picturing the world
Pearson Education, 2012

Inferential Statistic

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Inferential Statistic

Caricato da

Copyright:

Formati disponibili

Lecture 3

Muhammad Farhan Tabassum

Typically, we are interested in measuring the

If your measurement scale is RATIO

If you are using interval or ratio ORDINAL

 Simplest scale of measurement

 Can add & subtract values

What data types relate to following questions?

Q1: What is your favourite subject? Nominal

Maths English Science Art French

Q4: Score in a recent mock GCSE maths exam:

Statistical independence or association between two or

The Chi-Square Test of Independence can only compare

It cannot make comparisons between continuous variables

Additionally, the Chi-Square Test of Independence only

 Two categorical variables.

 Two or more categories (groups) for each variable.

 There is no relationship between the subjects in each

 Relatively large sample size.

The null hypothesis (H0) and alternative hypothesis (H1)

H0: "[Variable 1] is independent of [Variable 2]"

H0: "[Variable 1] is not associated with [Variable 2]"

The quantity (oij - eij) is sometimes referred to as

In the sample dataset, respondents were asked their

There are two different ways in which your data may

Open the Crosstabs dialog (Analyze > Descriptive

The value of the test statistic is 3.171.

The footnote for this statistic relates to the expected cell

Because the test statistic is based on a 3x2 cross tabulation

The corresponding p-value of the test statistic is p = 0.205.

Since the p-value is greater than our chosen significance

Rather, we conclude that there is not enough evidence to

Based on the results, we can state the following:

No association was found between gender and smoking

If the p-value is less than our chosen significance level α

Based on the results, we can state the following:

There was a significant association between both the

The bivariate Pearson Correlation produces a sample

By extension, the Pearson Correlation evaluates whether

The Pearson Correlation is a parametric measure.

The bivariate Pearson correlation indicates the following:

 Two or more continuous variables (i.e., interval or ratio

The null hypothesis (H0) and alternative

H0: ρ = 0 ("the population correlation coefficient

H1: ρ ≠ 0 ("the population correlation coefficient

where cov(x, y) is the sample covariance of x and

The sign of the correlation coefficient indicates the

a negative correlation corresponds to a

Perhaps you would like to test whether there is a

To run the bivariate Pearson Correlation,

Correlation of Height with itself (r=1), and the number

The correlations in the main diagonal (cells A and D)

Weight and height have a statistically significant

ONE SAMPLE T TEST

Statistical difference between the sample mean and the

Statistical difference between the sample mean of the test

Statistical difference between a change score and zero.

If the mean change score is not significantly different from

 Test variable that is continuous (i.e., interval or

 Scores on the test variable are independent

 There is no relationship between scores on the

 Violation of this assumption will yield an

 Random sample of data from the population

 Normal distribution (approximately) of the