Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Statistics the most important science in the whole world; for upon it depends the
practical applications of every science and of every art; the one science essential to all
political and social administration, all education all organization based on experience,
for it only gives results of our experience – Florence Nightingale
Statistics is the branch of science that deals with the collection, presentation, organization,
analysis and interpretation of data.
Variable is a characteristic or attribute of the elements in a collection that can assume different
values for the different elements.
Descriptive Statistics includes all the techniques used in organizing, summarizing and presenting
the data in hand.
Example:
2. Given the daily sales performance for a product for the previous year we can draw a line
chart or a column chart (bar) to emphasize the upward/downward movement of the
series. Likewise, we can use descriptive statistics to calculate a quantity index per quarter
to compare sales by quarter for the previous year.
Inferential Statistics includes all the techniques used in analyzing the sample data that will lead
to generalization about a population from which the sample came from.
Example:
1. To examine the performance of the country’s financial system, we can use inferential
statistics to arrive at conclusions that apply to the entire economy using the data gathered
from a sample of companies or businesses in the country.
STEP 1
Identify the Problem
STEP 2
Plan the Study
STEP 3
Collect the Data
STEP 4
Explore the Data
STEP 5
Analyze data and interpret the results
STEP 6
Present the results
Figure 1
Steps in statistical inquiry
Example:
1) The researchers want to determine if there is an association between the price and
production of lumber. They would have to answer the following questions first to
obtain a precise statement of the problem:
a) What kind of lumber will be included in the study? Will all types of lumber be
included or just one specific type?
b) Will the study include the whole production of lumber or only lumber produced
for sale?
c) What price for lumber will be used, the market price or the factory price?
d) What is the scope of the study? Are all the regions of the country included or just
a specific region or province only?
e) What period is covered in the study?
The statement of the research problem is usually in the form of a question. The researcher can state
the problem in the form of question, or in the form of a statement. One way of further refining the
statement of the problem is by formulating a hypothesis.
Example:
a) In the form of a question
“What are the factors affecting the job performance of an employee?”
“What is the relationship between wheat yield and amount of fertilizer?”
“What are the short and long term effects of the expanded value added tax on the economy
of the country?”
The investigators must double-check the results that contradict existing theory or the
earlier hypothesis made. They may have committed errors in data collection or
analysis. If not, they would have to propose possible explanations for these results or
suggest future statistical inquiries that could help explain the inconsistency.
6) Present the Results
After analyzing and interpreting the results, the investigators must present these
results in a clear and concise manner to the users of the research. All the time and
effort spent in conducting the inquiry will be in vain if the investigators do not
articulate what all the figures convey and how the presented information can be
useful for decision making. The presentation must also include a discussion of the
whole research process, from Step 1 to Step 5. This will help users evaluate for
themselves the reliability and credibility of the presented information.
The three ways of presenting results are textual, tabular, and graphical.
Textual presentation involves stating the results in paragraph form.
Tabular presentation involves showing the figures in rows and columns so that the
reader can easily comprehend the points made.
Graphical presentation involves placing data in graphic form to help the reader
visualize other important features of the data without having to look at too many
figures.
It is a capital mistake to theorize before one has data – Sir Arthur Conan Doyle
Measurement is the process of determining the value or label of the variable based
on what has been observed.
Example:
1) Allowance of the students (in pesos)
2) Distance travelled (in kms)
3) Speed of the car in (km/hr)
4) Weight of new born baby ( in kgs)
Interval level of measurement satisfies only the first three properties of the ratio scale.
The only difference between the interval and ratio levels is the interpretation of the value
0 (zero) in their scales. The zero point in the interval scale is NOT AN ABSOLUTE ZERO. Unlike in
the ratio scale, the zero value in the interval scale has an ARBITRARY INTERPRETATION and does
not mean the absence of the property we are measuring.
Ordinal level of measurement satisfies only the first two properties of the ratio scale.
Example:
1) Performance rating of a salesperson measured as follows: 1 for “excellent”, 2 for
“very good”, 3 for “good”, 4 for “satisfactory”, and 5 for “poor”.
2) Faculty rank of a teacher measured as follows: 1 for Professor, 2 for Associate
Professor, 3 for Assistant Professor, and 4 for Instructor.
3) Ranking of student in class according to his academic performance as 1st, 2nd,, 3rd, and
so on.
Nominal level of measurement satisfies only the first property of the ratio level.
A parametric statistical test makes an assumption about the population parameters and the
distributions that the data came from. These types of test includes Student’s T tests and ANOVA
tests, which assume data is from a normal distribution.
The opposite is a nonparametric test, which doesn’t assume anything about the population
parameters. Nonparametric tests include chi-square, Fisher’s exact test and the Mann-Whitney
test.
Do not put your faith in what statistics say until you have carefully considered what
they do not say.
William W. Watt
Hypothesis Testing
A statistical hypothesis is a conjecture concerning one or more populations whose veracity can be
established using simple data. The null hypothesis, denoted as Ho, is a statistical hypothesis which the
researcher doubts to be true. The alternative hypothesis, denoted as Ha is the operational statement of
the theory that the researcher believe to be true and wishes to prove and is a contradiction of the null
hypothesis.
In hypothesis testing, the level of significance refers to the degree of significance in which we accept
or reject the null hypothesis. In hypothesis testing, 100% accuracy is not possible for accepting or rejecting
a null hypothesis. So, we therefore select a level of significance that is usually 1% and 5%. Level of
significance is the maximum probability of committing a Type I error. That is, P (Type I error) = α. This
probability is symbolized by α (Greek letter alpha).
After the significance level is chosen, a critical value is selected from a table for the
appropriate test statistic. The critical value determines the critical and noncritical regions. The
critical value is a value that separates the critical region from the noncritical region. The critical
or rejection region is the range of the values of the test value that indicates that there is
significant difference and that the null hypothesis (H0) should be rejected. On the contrary,
noncritical or nonrejection region is the range of the values of the test value that indicates that
the difference was probably due to chance and that null hypothesis (H0) should not be rejected.
A Type I error occurs if one rejects the null hypothesis when it is true. In hypothesis testing type I error
is denoted by alpha (α). In hypothesis testing, the normal curve that shows the critical region is called the
alpha region.
A Type II error occurs if one does not reject the null hypothesis when it is false. In hypothesis testing,
type II errors ate denoted by beta (β). In hypothesis testing, the normal curve that shows the acceptance
region is called the beta region.
The hypothesis testing situation can be compared to a court trial. In a court trial, there are four
possible outcomes. The defendant is either guilty or innocent, and will be convicted or acquitted.
The hypotheses are
Statistical Decision H0: The defendant is innocent H0: The defendant is not innocent
Do not reject H0 CORRECT DECISION ACQUITTED
Reject H0 CONVICTED CORRECT DECISION
Next, the prosecutor will present the evidence and based on this evidence, the judge decides
the verdict, innocent or guilty. If the defendant is acquitted and did not commit the crime, a
correct decision has been made by the judge. On the other hand, if the defendant is acquitted
and has committed the crime, then Type II error has been made.
If the defendant is convicted but did not commit the crime, then a Type I error has been
committed. On the contrary, if the defendant is convicted and has committed the crime, then a
correct decision has been made.
Statistics are like bikini. What they reveal is suggestive but they
conceal is vital.
- Aaron Lavenstein, as quoted in Nature Genetics
Assumptions for Statistical Tests
Most of the statistical tests we will perform are based on a set of assumptions. When these assumptions
are violated the results of the analysis can be misleading or completely erroneous.
Homogeneity of variances (Homoscedasticity): Data from multiple groups have the same variance
We explore in detail what it means for data to be normally distributed in Normal Distribution, but in
general it means that the graph of the data has the shape of a bell curve. Such data is symmetric around
its mean and has kurtosis equal to zero.
If data is not symmetric, sometimes it is useful to make a transformation whereby the transformed data
is symmetric and so can be analyzed more easily.
Some tests (e.g. ANOVA) require that the groups of data being studied have the same variance.
In Homogeneity of Variances we provide some tests for determining whether groups of data have the
same variance.
Some tests (e.g. Regression) require that there be a linear correlation between the dependent and
independent variables. Generally linearity can be tested graphically using scatter diagrams or via other
techniques explored in Correlation, Regression and Multiple Regression.
We touch on the notion of independence in Definition 3 of Basic Probability Concepts. In general, data are
independent when there is no correlation between them (see Correlation). Many tests require that data
be randomly sampled with each data element selected independently of data previously selected. E.g. if
we measure the monthly weight of 10 people over the course of 5 months, these 50 observations are not
independent since repeated measurements from the same people are not independent. Also the IQ of 20
married couples doesn’t constitute 40 independent observations.
Another approach for addressing problems with assumptions is by transforming the data
(see Transformations).
D.E.A.R Method
Define
Example
Apply
Report
Is there a mean difference in SAT scores between freshmen and sophomore students?
Repeated measures t-test a.k.a Paired sample t-test – one group of people tested more than once
If a group of people are given a medication for high cholesterol, does their average cholesterol
levels decrease after one month?
One sample t-test – used to compare a sample mean with a known population mean or some other
meaningful, fixed value.
Does the class of 2015 have higher or lower SAT scores than the SAT scores of all students?
Problem: Stephen Schmidt (1994) conducted a series of experiments examining the effects of humor on
memory. He collected a set of humorous sentences and then modified each one to produce a
nonhumurous version of the same sentence. The humorous sentences were then presented to one group
of participants and the non-humurous sentences were presented to another group. Each group was given
a test to determine how many sentences they could recall. Data similar to those obtained by Schmidt are
shown in the following table.
a. Do the data provide enough evidence to conclude that humor has a significant on memory? Use
a two tailed test at the 0.05 level of significance.
b. Calculate Cohen’s d to evaluate the size of the effect.
c. Calculate the percentage of variance explained by the treatment, r2, to measure the effect size.
The participants who used humorous sentences (M = 4.25, SD = 2.33, n = 16) had statistically
significant higher effect on memory than those who used non-humorous sentences (M = 3.0, SD = 1.73, n
= 16). The effect size was medium. A power of 90% or 0.90 indicates that if the study was conducted 10
times it is likely to produce similar results (i.e statistically significant) 9 times.
State decision about the null hypothesis
EFFECT SIZE
Significance tell us that the effect was not due to chance (“real”)
Effect size is a measure of how large the effect was (“real big”)
Difference can be statistically significant but really not that practically impressive
REPORT
A one-way analysis of variance was conducted to evaluate the null hypothesis that there is no
difference on high school students’ level of satisfaction with school based on their family’s socioeconomic
status (N = 435). The independent variable, socioeconomic status, included three groups: Low (M = 21.36,
SD = 4.55, n = 147), Moderate (M = 22.10, SD = , n = 153), and High (M = 26.73, SD = 5.85, n = 135).
The assumption of normality was evaluated using histograms (see Figure No.) and found tenable
for all groups. The assumption of homogeneity of variances was tested and found tenable using Levene’s
test, F(2, 432) = 0.75, p = 0.48. The ANOVA was significant F(2, 432) = 4.64, p = 0.01, η2 = 0.02. Thus, there
is significant evidence to reject the null hypothesis and conclude there is significant difference between
high school students’ level of satisfaction and school based on their family’s socioeconomic status.
However, the actual difference in the mean scores between groups was quite small based on Cohen’s
(1988) conventions for interpreting effect size.
The Post hoc comparisons to evaluate pairwise difference among group means were conducted
with the use of Tukey HSD test since equal variances were tenable. Tests revealed significant pairwise
differences between the mean scores of students who come from families with low socioeconomic status
and students who come from families with high socioeconomic status, p < 0.05. Students who come from
families with medium socioeconomic status do not significantly differ from the other two groups, p > 0.05.
GUIDE
A one-way analysis of variance was conducted to evaluate the null hypothesis <state the null hypothesis>
(N = total number of observations). The independent variable, <identify the independent variable>,
included <number of groups> groups: <first category> (M = ______, SD = _____, n = ____), <2nd category>
(M = _____, SD =____ , n = _____), and <3rd category> (M = _____, SD = ______, n = _____).
The assumption of normality was evaluated using histograms (see Figure No.) and found tenable
(justifiable, supportable, arguable, reasonable, supportable, viable, workable, credible, acceptable) for all
groups. The assumption of homogeneity of variances was tested and found tenable using Levene’s test,
F(df 1, df2) = <Levene’s statistic>, p = <sig value>. The ANOVA was significant F(df1, df2) = <from ANOVA
table>, p = <sig value from the ANOVA table>, η2 = <compute the Cohen’s effect size by getting the ratio
of between groups sum of square and total sum of squares>. Thus, there is significant evidence to reject
the null hypothesis and conclude there is significant difference between <dependent variable and
independent variable>. However, the actual difference in the mean scores between groups was quite
<small, medium or large> based on Cohen’s (1988) conventions for interpreting effect size.
The Post hoc comparisons to evaluate pairwise difference among group means were conducted
with the use of Tukey HSD test since equal variances were tenable. Tests revealed significant pairwise
differences between the mean scores of students who come from families with <one group versus with
the other group based on multiple comparison table of post hoc test>, p < 0.05. Students who come from
<other group/s> do not significantly differ from the <other group/s>, p > 0.05.
*tenable – able to maintained or defended against attack or objection