Sei sulla pagina 1di 44

Data Analysis

Florenda F. Cabatit RN MA
Facilitator
DATA ANALYSIS

Data analysis is the process by which


information is rendered meaningful
and intelligible (Polit and Hungler,
1995).
It is the systematic organization and
synthesis of research data and the
testing of research hypotheses using
those data (2004).
Statistical Analysis
Quantitative analysis deals with
numerical analysis of information.
It is the manipulation of numeric data
through statistical procedures for the
purpose of describing phenomena or
assessing the magnitude and reliability
of relationships among them.
Statistics is the scientific method used in
quantitative analysis.
Statistics
Statistics helps to:
 Organize data
 Summarize data
 Evaluate data
 Present data in an easily
understood form.
Statistics
Two branches of Statistics:
 Descriptive statistics -
statistics used to describe and
summarize data
 Inferential Statistics –
statistics that permit inferences
on whether relationships
observed in a sample are likely
to occur in the larger population.
Considerations in the
choice of appropriate
statistical methods
 The purpose of the research
 The level of measurement of the
variables
 The number of groups/variables
involved
 The type of groups being studied
Levels of Measurement
 Nominal - the lowest level
- involves assigning numbers to classify
characteristics into categories
- numeric codes assigned in nominal
measurement do not convey quantitative
information.
- the numbers are merely symbols that
represent different values.
- categories must be mutually exclusive
and collectively exhaustive.
Ordinal Measurement
 This involves sorting objects on the basis
of their relative standing or ranking on an
attribute.
 The numbers are not arbitrary-they signify
incremental values but does not however,
tell anything about how much greater one
level is than another.
Interval Measurement

 A measurement in which
an attribute of a variable
is rank ordered on a scale
that has equal distances
between points on that
scale.
Ratio Scale
 A quantitative measurement in which intervals
are equal and there is a true zero point.
 The highest level of measurement
 All arithmetic operations are permissible with
this measurement (add, subtract, multiply, and
divide numbers on this scale).
Descriptive Statistics
Three characteristics to fully
describe a set of data:
• shape of the distribution
values
• central tendency
• Variability
Review of Descriptive
Stats.
 Descriptive Statistics are used to present
quantitative descriptions in a manageable
form.
 This method works by reducing lots of data
into a simpler summary.
 Example:
 370 Centigrade as average adult body
temperature
 SU’s quality-point system
Univariate Analysis
 This is the examination across cases of one
variable at a time.
 Frequency distributions are used to group
data.
 One may set up margins that allow us to
group cases into categories.
 Examples include
 Age categories
 Price categories
 Temperature categories.
Distributions
Two ways to describe a univariate
distribution
 A table
 A graph (histogram, bar chart)
Distributions (con’t)

 Distributions may also be displayed


using percentages.
 For example, one could use
percentages to describe the following:
 Percentage of people under the
poverty level
 Over a certain age
 Over a certain score on a
standardized test
Distributions (cont.)

A Frequency Distribution Table

Category Percent
Under 35 9%
36-45 21
46-55 45
56-65 19
66+ 6
Distributions (cont.)
A Histogram

45
40
35
30
25
20
Percent
15
10
5
0
36-45

46-55
Under

56-65

66+
35
Central Tendency
 An estimate of the “center” of a
distribution
 Three different types of
estimates:
 Mean
 Median
 Mode
Mean
 The most commonly used method of
describing central tendency.
 One basically totals all the results
and then divides by the number of
units or “n” of the sample.
 Example: The NCM 104 Quiz mean
was determined by the sum of all the
scores divided by the number of
students taking the exam.
Median
 The median is the score found at the
exact middle of the set.
 One must list all scores in numerical
order and then locate the score in
the center of the sample.
 Example: If there are 500 scores in
the list, score #250 would be the
median.
 This is useful in weeding out outliers.
Mode
 The mode is the most repeated score
in the set of results.
 Lets take the set of scores:
15,20,21,20,36,15, 25,15
 Again we first line up the scores
 15,15,15,20,20,21,25,36
 15 is the most repeated score and is
therefore labeled the mode.
Central Tendency
 If the distribution is normal (i.e., bell-
shaped), the mean, median and mode
are all equal.
 In our analyses, we’ll use the mean.
Dispersion
 Two estimates types:
 Range
 Standard deviation
 Standard deviation is more
accurate/detailed because an outlier can
greatly extend the range.
Range
 The range is used to identify the
highest and lowest scores.
 Lets take the set of
scores:15,20,21,20,36,15, 25,15.
 The range would be 15-36. This
identifies the fact that 21 points
separates the highest to the lowest
score.
Standard Deviation
 The standard deviation is a
value that shows the relation
that individual scores have to
the mean of the sample.
 If scores are said to be
standardized to a normal curve,
there are several statistical
manipulations that can be
performed to analyze the data
set.
Standard Dev. (con’t)
 Assumptions may be made about
the percentage of scores as they
deviate from the mean.
 If scores are normally distributed,
one can assume that
approximately 69% of the scores in
the sample fall within one standard
deviation of the mean.
Approximately 95% of the scores
would then fall within two standard
deviations of the mean.
Standard Dev. (con’t)
 The standard deviation calculates
the square root of the sum of the
squared deviations from the mean of
all the scores, divided by the number
of scores.
 This process accounts for both
positive and negative deviations
from the mean.
RESEARCH QUESTION: DESCRIBE

LEVEL TYPE OF DESCRIPTION STATISTICAL TOOL

Frequency distribution
Distribution Contingency Table
NOMINAL
Central Tendency
Mode

Distribution Frequency Distribution


ORDINAL Contingency Table
Scatterpoint

Central Tendency
Mode, Median

Frequency Distribution
Distribution Contingency Table
Scatterpoint
RATIO/INTERVAL
Central Tendency
Mode, Median, Mean

Variability
Range, Variance,
Standard Deviation
Inferential
statistics
 Based on the law of probability
 It provides a means for drawing
conclusions about a population,
given data from a sample
 It estimates population parameters
from sample statistics
Inferential
Statistics
Statistical Inference consists of two
techniques:
2.Estimation of parameters
3.Hypothesis testing
Hypothesis Testing
Statistical hypothesis testing provides
objective criteria for deciding whether
hypotheses are supported by empirical
evidence.
 It is a process of disproof or rejection.
 Researchers seek to reject the null
hypothesis through various statistical
tests.
 Hypothesis testing uses samples to draw
conclusions about relationships within the
population.
Type I and Type II
Errors
Type I Error - researchers make a type I
error when a true null hypothesis is
rejected.

Type II Error – researchers make a type II


error when a false null hypothesis is
accepted
Level of Significance
This refers to the risk of making a type
I error in a statistical analysis.
The value selected beforehand
signifies the risk or the probability of
rejecting of rejecting a true null
hypothesis.
The two most frequently used
significance levels (referred to as alpha or
α) are:
.05
.01
Level of Significance
 With .05 significance level, we are
accepting the risk that out of 100 samples
drawn from a population, a true null
hypothesis would be rejected only 5 times.

 With a .01 level of significance, the risk of


a type I error is lower: in only 1 sample out
of 100 would we erroneously reject the
null hypothesis.
Critical Region
This refers to the area in the sampling
distribution representing values that
are “improbable” if the null hypothesis
is true.
It is defined by the level of significance
Statistical Tests
Two-tailed test- this means that both ends
or tails of the sampling distribution are
used to determine improbable values.

In one-tailed tests, the critical region of


improbable values is entirely in one tail
of the distribution-the tail corresponding
to the direction of the hypothesis
An example of Critical Regions of a two
-tailed test
Types of Statistical
Tests
Parametric Tests – a class of
inferential statistical tests that
involve:
a. Assumptions about the
distribution of the variables
b. The estimation of a parameter
c. The use of interval or ratio
measures.
Statistical Tests

Non-parametric Tests –statistical


tests that do not estimate parameters
- also called distribution-free statistics.
Steps in Hypothesis
1.testing
State the alternative hypothesis
2. State the null hypothesis
3. Establish the level of significance
4. Select a one-tailed or two-tailed test
5. Compute a test statistic
6. Calculate the degrees of freedom
7. Obtain a tabled value for the statistical
test
8. Compare the test statistic with the
tabled value.
The Decision Matrix
In reality Null true Null false
Alternative false Alternative true
In reality... In reality...
What • There is no real program effect • There is a real program effect
• There is no difference, gain
we conclude • Our theory is wrong


There is a difference, gain
Our theory is correct

Accept null 1-α β


Reject alternative THE CONFIDENCE LEVEL TYPE II ERROR
We say...
The odds of saying there is no The odds of saying there is no
• There is no real program effect or gain when in fact there effect or gain when in fact there
is none is one
effect
• There is no difference, # of times out of 100 when # of times out of 100 when
gain there is no effect, we’ll say there is an effect, we’ll say
• Our theory is wrong there is none there is none
Reject null α 1-β
Accept alternative TYPE I ERROR POWER
We say... The odds of saying there is an The odds of saying there is an
effect or gain when in fact there effect or gain when in fact there
• There is a real program is none is one
effect
• There is a difference, gain # of times out of 100 when # of times out of 100 when
• Our theory is correct there is no effect, we’ll say there is an effect, we’ll say
there is one there is one
Decision Matrix

If you try to increase power, you


increase the chance of winding
up in the bottom row and of
Type I error.

If you try to decrease Type I


errors, you increase the chance
of winding up in the top row and
of Type II error.