Sei sulla pagina 1di 63

Tools of measurement in

research
Major Categories of Data

2 Major Categories
1. Primary
2. Secondary Data
Major Categories of Data

1. Primary- data collected by the researcher himself/herself.


 This is data that has never been gathered before, whether in a particular
way, or at a certain period of time.
2. Secondary Data- data come from other studies done by other
researchers or institutions or organizations.
 There is no less validity with secondary data, but you should be well
informed about how it was collected.
Sources of Primary data

 Questionnaire survey
It is distributed—or made accessible if online—to a
predetermined selection of individuals.
Individuals complete and return the questionnaire or
submit online.
 Face-to-face interview
Interviewer asks questions, usually following a guide
or protocol.
Interviewer records answers.
Sources of Primary data

 Telephone interview
Interviewer asks questions, usually following a guide or protocol.
Interviewer records responses.
 Group techniques (interview, facilitated workshop, focus group)
This involves group discussion of predetermined issue or topic in
person or through teleconferencing.
Group members share certain common characteristics.
Facilitator or moderator leads the group.
Assistant moderator usually records responses.
Sources of Secondary Data

 Document review
Researchers review documents, and identify relevant information.
They keep track of the information retrieved from documents.

Source: Lusthaus and others 1999


Types of Data

1. Qualitative- descriptive data


2. Quantitative- numerical data
 Can be Discrete or Continuous
Types of Data

Qualitative Data Quantitative Data


Overview: Overview:
 Deals with descriptions.  Deals with numbers.
 Data can be observed but  Data which can be measured.
not measured.  Length, height, area, volume,
 Colors, textures, smells, weight, speed, time,
tastes, appearance, beauty, temperature, humidity,
etc. sound levels, cost, members,
 Qualitative → Quality ages, etc.
 Quantitative → Quantity 
Examples of types of Data:
Qualitative vs Quantitative
Examples of types of Data:
Qualitative vs Quantitative
Examples of types of Data:
Qualitative vs Quantitative
Types of Quantitative Data:
Continuous vs Discrete

Continuous Discrete

Definition: A set of data is said to Definition: A set of data is said to


be continuous if the values be discrete if the values belonging
belonging to the set can take on to the set are distinct and
ANY value within a finite or separate (unconnected values).
infinite interval.
Examples of Continuous vs Discrete
data
Continuous Discrete

Examples: Examples:
• The height of a horse (could be any • The number of people in your class (no
value within the range of horse heights). fractional parts of a person).
• Time to complete a task (which could be • The number of TV sets in a home (no
measured to fractions of seconds). fractional parts of a TV set).
• The outdoor temperature at noon (any • The number of puppies in a liter (no
value within possible temperatures fractional puppies).
ranges.) • The number of questions on a math test
• The speed of a car on Route 3 (no incomplete questions).
(assuming legal speed limits).

NOTE: Continuous data usually requires a NOTE: Discrete data is counted. In whole
measuring device. (Ruler, stop watch, numbers. The description of the task is
thermometer, speedometer, etc.). It usually preceded by the words "number
Types of Measurement Scales

 These are ways to


categorize the
different types of
variables
Types of Variables

A variable is not only something that we measure, but also


something that we can manipulate and something we can control.
1. Independent variable, sometimes called an experimental or
predictor variable, is a variable that is being manipulated in an
experiment in order to observe the effect on a dependent
variable
2. Dependent variable, sometimes called an outcome variable.
Types of scales: Nominal

 Nominal scales are used for labeling variables, without


any quantitative value.  
 May be called “name or labels”
 These scales are mutually exclusive meaning there is no
overlap
 None of them have any numerical significance
Examples of nominal scale

sex Civil status:


single
married
widow/er
separated

Age
Educational Attainment
Occupation
Religion
Examples of nominal scale

 a sub-type of nominal scale with only two categories


(e.g. male/female) is called “dichotomous”
Types of Scales: Ordinal

 measures non-numeric concepts like satisfaction,


happiness, discomfort
 it is the order of the values is what’s important and
significant
 but the differences between each one is not really known
Examples of ordinal scale
Types of Scales: Interval

 Interval scales are numeric scales in which we know not


only the order, but also the exact differences between the
values
 “Interval” itself means “space in between”
 the problem with interval scales: they don’t have a “true
zero.”  
Examples of interval scale

 The classic example of an interval scale is Celsius temperature because the


difference between each value is the same.  
 For example, the difference between 60 and 50 degrees is a measurable 10
degrees, as is the difference between 80 and 70 degrees.
 And there is no such thing as no temperature or zero temperature.
 Without a true zero, it is impossible to compute ratios.  
 Time is another good example of an interval scale in which the 
increments are known, consistent, and measurable.
Types of Scales: Ratio

 Ratio scales are the ultimate nirvana when it comes to measurement


scales
 because they tell us about the order,
 they tell us the exact value between units,
 they also have an absolute zero–which allows for a wide range of
both descriptive and inferential statistics to be applied.  
 Ratio scales provide a wealth of possibilities when it comes to statistical
analysis.  
 These variables can be meaningfully added, subtracted, multiplied,
divided (ratios).  
 Central tendency can be measured by mode, median, or mean;
measures of dispersion, such as standard deviation and coefficient of
Examples of ratio scales

 Good examples of ratio variables include height and weight.


Summary of types of measurements

 nominal variables are used to “name,” or label a series of values.


 Ordinal scales provide good information about the order of choices,
such as in a customer satisfaction survey.
 Interval scales give us the order of values + the ability to quantify
the difference between each one.  
 Ratio scales give us the ultimate–order, interval values, plus the
ability to calculate ratios since a “true zero” can be defined.
Tests of Relationships
and Difference
Univariate Descriptive Statistics

A. Measures of Central Tendency


1. Mean
2. Median
3. mode
B. Measures of Dispersion
1. Range
2. Variance
3. Standard Deviation
Univariate Descriptive Statistics

Let’s discuss first the Measures of Central Tendency


1. Mean
2. Median
3. mode
Measures of Central Tendency

 A measure of central tendency is a single value that


attempts to describe a set of data by identifying the
central position within that set of data.
 sometimes called measures of central location.
 They are also classed as summary statistics.
Measures of Central Tendency:
the MEAN

1. Mean
 the mathematical average
 the most popular and well known measure of central
tendency
 It can be used with both discrete and continuous data,
although its use is most often with continuous data
Measures of Central Tendency:
the MEAN
Measures of Central Tendency:
the MEAN

this formula refers to the sample mean


Measures of Central Tendency:
the MEAN

this formula refers to the population mean


Measures of Central Tendency:
the MEDIAN

2. Median
 is the middle score for a set of data that has
been arranged in order of magnitude.
Measures of Central Tendency:
the MEDIAN

 In order to calculate the median, suppose we have the data below:


 65 55 89 56 35 14 56 55 87 45 92

 We first need to rearrange that data into order of magnitude (smallest first):
 14 35 45 55 55 56 56 65 87 89 92

 Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is
the middle mark because there are 5 scores before it and 5 scores after it. This
works fine when you have an odd number of scores,
Measures of Central Tendency:
the MEDIAN

 But what happens when you have an even number of scores? What if you had only 10
scores? Well, you simply have to take the middle two scores and average the result. So, if
we look at the example below:
 65 55 89 56 35 14 56 55 87 45
 
 We again rearrange that data into order of magnitude (smallest first): 
 14 35 45 55 55 56 56 65 87 89
 
 Only now we have to take the 5th and 6th score in our data set and average them to
get a median of 55.5.
Measures of Central Tendency:
the MODE

3. Mode
 The mode is the most frequent score in our data set.
 Therefore, sometimes consider the mode as being the most popular
option.
 It is used for categorical data where we wish to know which is the
most common category.
Measures of Central Tendency:
the MODE

 For example in this image, the mode


represents the highest bar in a
bar chart or histogram.
Measures of Central Tendency:
the MODE

 We can see that the most common


form of transport, in this particular
data set, is the bus.
Univariate Descriptive Statistics

We have discussed the Measures of Central Tendency


 Mean Median Mode

Let’s proceed to the Measures of Dispersion


1. Range
2. Variance
3. Standard Deviation
Measures of Dispersion

 If everything were the same, we would have no need of


statistics.
 But, people's heights, weights, ages, etc., do vary.
 We often need to measure the extent to which scores in a
dataset differ from each other.
 Such a measure is called the dispersion of a distribution.
Measures of Dispersion: the RANGE

 The range is the simplest measure of dispersion.


 It can be thought of in two ways.
 As a quantity: the difference between the highest and lowest
scores in a distribution.
 Example highest score is 100 and the lowest is 68. "The range of
scores on the exam was 32."
 As an interval; the lowest and highest scores may be reported as the
range.
 "The range was 62 to 94," which would be written (62, 94).
Measures of Dispersion: the
VARIANCE
Measures of Dispersion: the
VARIANCE
Measures of Dispersion: the
VARIANCE

Note that M can be represented by x̄


Measures of Dispersion: the
VARIANCE

Note that M can be represented by x̄


Measures of Dispersion:
the STANDARD DEVIATION

 In many if not most statistical studies, a conclusion is made as to


whether a particular set of data is significantly different from a
control set of data.
 In order to make that conclusion, one must know the variability
of the data. 
 The measure of variability used is nearly always the standard
deviation.
Measures of Dispersion:
the STANDARD DEVIATION
Measures of Dispersion:
the STANDARD DEVIATION
 We have discussed the Univariate
Descriptive Statistics

 Let’s move over and discuss the Bi and


Multivariate Inferential Statistical Tests
Bi and Multivariate Inferential
Statistical Tests

 The purpose of these kinds of tests is to determine the empirical


relationship between two different variables.
 This is better to see those variables are interrelated or not.
 This kind of data analysis process is useful enough to test hypotheses
of association and causality.
 It helps to verify how it is easy to predict the easiness and prediction
of the value in terms of dependent variable in case of a known case
value of an independent variable.
Bi and Multivariate Inferential
Statistical Tests

1. chi-square test
2. T-test
3. Z-test
4. ANOVA
The Chi-Square Test

 Chi-square is a statistical test commonly used to compare observed


data with data we would expect to obtain according to a specific
hypothesis.
 For example, if, according to Mendel's laws, you expected 10 of 20
offspring from a cross to be male and the actual observed number
was 8 males, then you might want to know about the "goodness to
fit" between the observed and expected.
 Were the deviations (differences between observed and expected)
the result of chance, or were they due to other factors.
The Chi-Square Test

 How much deviation can occur before the investigator must


conclude that something other than chance is at work, causing the
observed to differ from the expected.
 The chi-square test is always testing what scientists call the
Null hypothesis (Ho), which states that there is no significant
difference
between the expected and observed results
The Chi-Square Test formula

Here,
O = Observed frequency
E = Expected frequency
∑ = Summation
X 2 = Chi Square value
The Paired T-test

 A statistical technique that is used to compare two


population means in the case of two samples that
are correlated. 
 Paired sample t-test is used in ‘before-after’ studies,
or when the samples are the matched pairs, or when
it is a case-control study. 
The Paired T-test

 For example, if we give training to a company employee and we


want to know whether or not the training had any impact on the
efficiency of the employee, we could use the paired sample test. 
 We collect data from the employee using a questionnaire, before the
training and after the training. 
 By using the paired sample t-test, we can statistically conclude
whether or not training has improved the efficiency of the
employee. 
 In medicine, by using the paired sample t-test, we can figure out
whether or not a particular medicine will cure the illness
The Paired T-test

Where,
d bar is the mean difference between two samples,
s² is the sample variance,
n is the sample size and
t is a paired sample t-test with n-1 degrees of
freedom
The Paired T-test

An alternate formula for paired sample t-test:


The Z-test

 Is a concept of statistics which compares means of two populations.


 Z test assumes normal distribution under null hypothesis.
 Z test is performed on a large number of data or on a population
data.
 On the other hand, for a small data or sample data, T test is
performed.
The Z-test

 The score determined by Z test is called "Z score".


Where,
x = Standardized random variable
x̄ = Mean of the data
σ = Population standard deviation
The ANOVA

 The ANalysis Of Variance is a statistical technique for determining


the existence of differences among several population means.
 The technique requires the analysis of different forms of variances –
hence the name.
 But note: ANOVA is not used to show that variances are different
rather it is used to show that means are different.

Where,
F = Anova Coefficient
MST = Mean sum of squares due to treatment
MSE = Mean sum of squares due to error.
The ANOVA

Formula for MST is given below:


Formula for MSE is given below:

Where, Where,
SST = Sum of squares due to treatment SSE = Sum of squares due to error
p = Total number of populations S = Standard deviation of the
n = Total number of samples in a population. samples
N = Total number of observations.