Introduction

Introduction to
biostatistics
Lecture plan
Basics
Variable types
Descriptive statistics:
1.
2.
3.
Categorical data
Numerical data
Inferential statistics
4.
Confidence intervals
Hipotheses testing
DEFINITIONS
STATISTICS can mean 2 things:
- the numbers we get when we measure and
count things (data)
- a collection of procedures for describing and
anlysing data.
BIOSTATISTICS application of statistics
in nature sciences, when biomedical and
problems are analysed.
2
Why do we need statistics?
????
Basic parts of statistics:
Descriptive
Inferential
Terminology
Population
Sample
Variables
Variable types
Categorical (qualitative)
Numerical (quantitative)
Combined
Categorical data
Nominal
2 categories
>2 categories
Ordinal
Numerical data
Continuous
Discrete
Description of categorical
data
Arranging data
Frequencies, tables
Visualization (graphical
presentation)
Frequencies and
contingency tables
From those
who were
unsatisfied 4
were males,
6 were
females.
Total
Males Females
40
80%
14
77,8
%
26
81,3%
Unsatisfied 10
20 %
4
22,2
%
6
18,7%
Total
18
32
100% 100%
Satisfied
50
100%
10
Graphical presentation
11
12
13
14
Other:
- Maps
- Chernoff faces
- Star plots, etc.
15
Description of numerical
data
Arranging data
Frequencies (relative and cumulative),
graphical presentation
Measures of central tendency and
variance
Assessing normality
16
Grouping
Sorting data
Groups (5-17 gr.) according
researchers criteria.
To assess distribution, for graphical presentation in excel
17
Frequencies, their comparison

and calculation
197
students
were
asked
about
the
amount
of money
(litas)
they had
in cash
at the
18
Gaphical presentation of
frequencies
19
Normal distributions
Most
of them around center

Less above and lower central
values, approximately the
same proportions
Most often Gaussian
distribution
20
Not normal distributions
More observations in one part.
21
Asymmetrical distribution
22
How would you

describe/present your
respondents if the data are
numeric?
2 groups of measures:
1. Central tendency (central
value, average)
2. Variance
23
MEASURES OF CENTRAL
TENDENCY
Means/averages (arithmetic,
geometric, harmonic, etc.)
Mode
Median
Quartiles
24
MEASURES OF CENTRAL
TENDENCY
Arithmetic mean (X, )
25
1
2
MEASURES OF CENTRAL
TENDENCY
Median (Me) the middle value or 50th

procentile (the value of the observation,
that divides the sorted data in almost
equal parts).
It is found this way
When
n odd: median is the middle observation

When n even: median is the average of values
of two middle observations
26
MEASURES OF CENTRAL
TENDENCY
Mode (Mo) the most common

values
Can be more than one mode
27
MEASURES OF CENTRAL
TENDENCY
Quartiles (Q1, Q2, Q3, Q4) sample

size is divided into 4 equal parts
getting 25% of observations in each
of them.
28
Is it enough measure of
central tendency to
describe respondents?
29
MEASURES OF VARIANCE
Min and max
Range
Standard deviation sqrt of
variance (SD)
Variance - V= (xi - x)2/n-1
Interquartile range (Q3-Q1 or
75%-25%) IQRT
30
What measures are to be used for

sample description?
If distribution is NORMAL
Mean
Variance (or standard deviation)
If distribution is NOT NORMAL
Median
IQRT or min/max
Those measures are used also with numeric ordinal data

31
X, Mo, Me
Mean~Median~Mode,
SD ir empyric rule
32
EMPYRICAL RULE
Number of observations (%) 1, 2 ir

2.5 SD from mean if distribution is
normal
33
Example
X=8
SD=2,5
-2SD
+2SD
34
Normality assessment
Summary
Graphical
Comparison of measures of central
tendency; empyrical rule (mean and
standard deviation)
Skewness and kurtosis (if Gaussian
=0)
Kolmogorov-Smirnov test
35
Boxplot
75th Procentile
75th Procentile
Mean( *)
Median
25th Procentile
25th Procentile
Outliers
Boxplot example
26,00
24,67
23,33
22,00
20,67
19,33
18,00
16,67
15,33
14,00
440
Central limit theorem
Hipotheses testing
39
Interval where the true value
most likely could occur.
40
The variance of samples

and their measures
X2, SD2; p2
X1, SD1; p1
X3, SD3; p3
X4; SD4; p4
, , p0
41
The variance of samples and

confidence intervals
, p0
42
Confidence interval
Statistical definition:
If the study was carried out 100 times, 100
results ir 100 CI were got, 95 times of 100 the
true value will be in that interval. But it will
not appear in that interval 5 times of 100.
43
(general, most common

calculation)
95% CI : X 1.96 SE
Xmin; Xmax
Note: for normal distribution, when n is large
95% CI : p 1.96 SE
pmin ; pmax
Note: when p ir 1-p > 5/n
44
SD
p
(
1
)
NN
Standard error (SE)

Numeric data
(X )
Categorical data
(p)
45
Width of confidence inerval

depends on:
a) Sample size;
b) Confidence level (guaranty - usually 95%,
but available any %);
c) dispersion.
46
Hipotheses testing
H0: 1=2; p1=p2; (RR=1, OR=1,
difference=0)
HA: 12; p1p2 (two sided, one
sided)
47
Hipotheses testing
Significance level (agreed 0.05).
Test for P value (t-test, 2 , etc.).
P value is the probability to get the
difference (association), if the null
hypothesis is true.
OR P value is the probability to get the difference
(association) due to chance alone, when the null
hypothesis is true.
48
Statistical agreements
If P<0.05, we say, that results cant

be explained by chance alone,
therefore we reject H0 and accept HA.
If P0.05, we say, that found

difference can be due to chance
alone, therefore we dont reject H 0.
49
Tests
Test depends on
Study design,
Variable type
distribution,
Number of groups, etc.
Tests (probability distributions):
z test
t test (one sample, two independent, paired)
2 (+ trend)
F test
Fisher exact test
Mann-Whitney
Wilcoxon and others.
50
Summary
P value tells, if there is statistically

significant difference (association).
CI gives interval where true value can

be.
51
Summary
Neither P value, nor CI give other

explanations of the result (bias and
confounding).
Neither P value, nor CI tell anything

about the biological, clinical or public
health meaning of the results.
52

Introduction

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Introduction

Caricato da

Copyright:

Formati disponibili

Introduction to

Why do we need statistics?

Basic parts of statistics:

To assess distribution, for graphical presentation in excel

Frequencies, their comparison

of them around center

Not normal distributions

More observations in one part.

How would you

Arithmetic mean (X, )

Median (Me) the middle value or 50th

n odd: median is the middle observation

Mode (Mo) the most common

Can be more than one mode

Quartiles (Q1, Q2, Q3, Q4) sample

What measures are to be used for

If distribution is NOT NORMAL

Those measures are used also with numeric ordinal data

Number of observations (%) 1, 2 ir

Central limit theorem

The variance of samples

The variance of samples and

(general, most common

Note: for normal distribution, when n is large

Note: when p ir 1-p > 5/n

Standard error (SE)

Width of confidence inerval

If P<0.05, we say, that results cant

If P0.05, we say, that found

Tests (probability distributions):

P value tells, if there is statistically

CI gives interval where true value can

Neither P value, nor CI give other

Neither P value, nor CI tell anything

Potrebbero piacerti anche