Sei sulla pagina 1di 33

Introduction to Research Methods

In the Internet Era

Introduction to Biostatistics
Data Collection
Descriptive Statistics

Thomas Songer, PhD


with acknowledgment to several slides provided by
M Rahbar and Moataza Mahmoud Abdel Wahab
Key Lecture Concepts

Distinguish between different strategies


for obtaining a sample from a population
Distinguishing between different forms of
data collection
Identify key approaches to organize and
portray your data
Understand the measures of central
tendency and variability in your data
2
Descriptive & Inferential Statistics
Descriptive Statistics deal with the
enumeration, organization and graphical
representation of data from a sample

Inferential Statistics deal with reaching


conclusions from incomplete information, that
is, generalizing from the specific sample

Inferential statistics use available information in


a sample to draw inferences about the population
from which the sample was selected Rahbar
Epidemiology is
The study of disease and its treatment,
control, and prevention in a population of
individuals.
Whole populations may be examined, but
More frequently, samples of the population
may be examined. Samples that are studied
must be representative of the population for
the results to be generalized to the total
population.
Torrence 1997 4
Hypothetical Population

Sample 1: Representative? Y N

Sample 2: Representative? Y N

Sample 3: Representative? Y N

5
Sampling Approaches
Convenience Sampling: select the most
accessible and available subjects in target
population. Inexpensive, less time consuming,
but sample is nearly always non-representative
of target population.

Random Sampling (Simple): select subjects at


random from the target population. Need to
identify all in target population first. Provides
representative sample frequently.
6
Sampling Approaches
Systematic Sampling: Identify all in target
population, and select every xth person as a
subject.

Stratified Sampling: Identify important sub-


groups in your target population. Sample from
these groups randomly or by convenience.
Ensures that important sub-groups are included
in sample. May not be representative.

More complex sampling 7


Sampling Error
The discrepancy between the true population
parameter and the sample statistic
Sampling error likely exists in most studies,
but can be reduced by using larger sample
sizes
Sampling error approximates 1 / n
Note that larger sample sizes also require time
and expense to obtain, and that large sample
sizes do not eliminate sampling error
8
Research Process
Research question

Hypothesis

Identify research design

Data collection

Presentation of data

Data analysis

Interpretation of data

Polgar, Thomas 9
Types of Data Collection

Surveys/Questionnaires
Self-report
Interviewer-administered
proxy
Direct medical examination
Direct measurement (e.g. blood draws)
Administrative records
10
Understanding and Presenting
Data

11
Types of Data

1. Categorical: (e.g., Sex, Marital Status,


income category)
2. Continuous: (e.g., Age, income, weight,
height, time to achieve an outcome)
3. Discrete: (e.g.,Number of Children in a
family)
4. Binary or Dichotomous: (e.g., response to
all Yes or No type of questions)
12
Brain Size and IQ
What types of data do these variables represent?
Gender FSIQ VIQ PIQ Weight Height MRI Count
Female 133 132 124 118 64.5 816932
Male 140 150 124 124 72.5 1001121
Male 139 123 150 143 73.3 1038437
Male 133 129 128 172 68.8 965353
Female 137 132 134 147 65 951545
Female 99 90 110 146 69 928799
Female 138 136 131 138 64.5 991305
Female 92 90 98 175 66 854258
Male 89 93 84 134 66.3 904858
Male 133 114 147 172 68.8 955466
Female 132 129 124 118 64.5 833868
13
Scale of Data
1. Nominal: These data do not represent an amount or
quantity (e.g., Marital Status, Sex)

2. Ordinal: These data represent an ordered series of


relationship (e.g., level of education)

3. Interval: These data is measured on an interval scale


having equal units but an arbitrary zero point. (e.g.:
Temperature in Fahrenheit)

4. Interval Ratio: Variable such as weight for which we


can compare meaningfully one weight versus another
(say, 100 Kg is twice 50 Kg) 14
Organizing Data and Presentation

Frequency Table
Frequency Histogram
Relative Frequency Histogram
Frequency polygon
Relative Frequency polygon
Bar chart
Pie chart
Box plot
15
Frequency Table
Generally, the first approach to examining
your data.
Identifies distribution of variables overall
Identifies potential outliers
Investigate outliers as possible data entry
errors
Investigate a sample of others for data entry
errors
16
Frequency Table

A research study has been conducted examining


the number of children in the families living in a
community. The following data has been
collected based on a random sample of n = 30
families from the community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0,
5, 8, 6, 5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!
17
X=No. of Count Relative Freq.
Children (Frequency)
0 2 2/30=0.067
1 3 3/30=0.100
2 5 5/30=0.167
3 5 5/30=0.167
4 6 6/30=0.200
5 4 4/30=0.133
6 2 2/30=0.067
7 2 2/30=0.067
8 1 1/30=0.033
18
Frequency Table
Now, construct a similar frequency table for the
age of patients with Heart related problems in a
clinic.

The following data has been collected based on a


random sample of n = 30 patients who went to the
emergency room of the clinic for Heart related
problems.

The measurements are: 42, 38, 51, 53, 40, 68, 62,
36, 32, 45, 51, 67, 53, 59, 47, 63, 52, 64, 61, 43, 56,
58, 66, 54, 56, 52, 40, 55, 72, 69. 19
Age Groups Frequency Relative
Frequency
32 -36 yr 2 2/30=0.067
37- 41 yr 3 3/30=0.100
42-46 yr 4 4/30=0.134
47-51 yr 3 3/30=0.100
52-56 yr 8 8/30=0.267
57-61 yr 3 3/30=0.100
62-66 yr 4 4/30=0.134
67-72 yr 3 3/30=0.100
Total n=30
20
Frequency Polygon
Use to identify the distribution of your data
9

8 Female
7 Male

6
Frequency

5
4

0
20- 30- 40- 50- 60-69
Age in years

21
Table 1 in a paper
Describe your study population in a frequency table

Table Title
Name of variable Frequency Mean
%
(Units of variable) (n) (SD)
-
- Categories
-

Total
22
Measures of Central Tendency

Where is the heart of distribution?

1. Mean
2. Median
3. Mode

23
Sample Mean
The arithmetic mean (or, simply, mean) is
computed by summing all the observations in the
sample and dividing the sum by the number of
observations.

For a sample of five household incomes, 6000,


10,000, 10,000, 14000, 50,000 the sample mean is,

6000 + 10000 + 10000 + 14000 + 50000


X = = 18000
5
24
Median
In a list ranked from smallest
measurement to the highest, the median is
the middle value

In our example of five household incomes,


first we rank the measurements

6,000 10,000 10,000 14,000 50,000

Sample Median is 10,000


25
Mode
In nominal data:
The value which occurs with the greatest
frequency

26
Measures of non-central locations

Quartiles
Quintiles
Percentiles

27
Measures of Dispersion or Variability

Range (present highest and lowest value


in a distribution. The difference between
these values is the range)

Variance

Standard deviation (the square root of


the variance)
28
Sample Variance
n
( xi - x ) 2

2 i=1
s =
n -1
S = standard deviation
(square root of variance)
29
Calculation of Variance and
Standard deviation

2 2 2 2
2 (6000-18000 ) +(10000-18000 ) +(10000-18000 ) +(14000-18000)+(50000-18000 )
S= =
5-1

2
S = 328,000,000
S 18110.77
30
Mean and Standard deviation (SD)

7 8
7 7
7 77
7 77
6 3 2
7
7 8 13
Mean = 7 9
Mean = 7 SD=0.63
SD=0
Mean = 7
SD=4.04
31
Empirical Rule
For a Normal distribution approximately,

a) 68% of the measurements fall within one


standard deviation around the mean

b) 95% of the measurements fall within two


standard deviations around the mean

c) 99.7% of the measurements fall within three


standard deviations around the mean
32
Suppose the reaction time of a particular drug
has a Normal distribution with a mean of 10
minutes and a standard deviation of 2 minutes
Approximately,
a) 68% of the subjects taking the drug will have
reaction time between 8 and 12 minutes

b) 95% of the subjects taking the drug will have


reaction tome between 6 and 14 minutes

c) 99.7% of the subjects taking the drug will have


reaction tome between 4 and 16 minutes
33

Potrebbero piacerti anche