Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
biostatistics
Lecture plan
Basics
Variable types
Descriptive statistics:
1.
2.
3.
Categorical data
Numerical data
Inferential statistics
4.
Confidence intervals
Hipotheses testing
DEFINITIONS
STATISTICS can mean 2 things:
- the numbers we get when we measure and
count things (data)
- a collection of procedures for describing and
anlysing data.
BIOSTATISTICS application of statistics
in nature sciences, when biomedical and
problems are analysed.
2
????
Descriptive
Inferential
Terminology
Population
Sample
Variables
Variable types
Categorical (qualitative)
Numerical (quantitative)
Combined
Categorical data
Nominal
2 categories
>2 categories
Ordinal
Numerical data
Continuous
Discrete
Description of categorical
data
Arranging data
Frequencies, tables
Visualization (graphical
presentation)
Frequencies and
contingency tables
From those
who were
unsatisfied 4
were males,
6 were
females.
Total
Males Females
40
80%
14
77,8
%
26
81,3%
Unsatisfied 10
20 %
4
22,2
%
6
18,7%
Total
18
32
100% 100%
Satisfied
50
100%
10
Graphical presentation
11
Graphical presentation
12
Graphical presentation
13
Graphical presentation
14
Graphical presentation
Other:
- Maps
- Chernoff faces
- Star plots, etc.
15
Description of numerical
data
Arranging data
Frequencies (relative and cumulative),
graphical presentation
Measures of central tendency and
variance
Assessing normality
16
Grouping
Sorting data
Groups (5-17 gr.) according
researchers criteria.
17
18
Gaphical presentation of
frequencies
19
Normal distributions
Most
20
21
Asymmetrical distribution
22
23
MEASURES OF CENTRAL
TENDENCY
Means/averages (arithmetic,
geometric, harmonic, etc.)
Mode
Median
Quartiles
24
MEASURES OF CENTRAL
TENDENCY
25
1
2
MEASURES OF CENTRAL
TENDENCY
When
26
MEASURES OF CENTRAL
TENDENCY
27
MEASURES OF CENTRAL
TENDENCY
28
Is it enough measure of
central tendency to
describe respondents?
29
MEASURES OF VARIANCE
Min and max
Range
Standard deviation sqrt of
variance (SD)
Variance - V= (xi - x)2/n-1
Interquartile range (Q3-Q1 or
75%-25%) IQRT
30
Mean
Variance (or standard deviation)
Median
IQRT or min/max
X, Mo, Me
Mean~Median~Mode,
SD ir empyric rule
32
EMPYRICAL RULE
33
Example
X=8
SD=2,5
-2SD
+2SD
34
Normality assessment
Summary
Graphical
Comparison of measures of central
tendency; empyrical rule (mean and
standard deviation)
Skewness and kurtosis (if Gaussian
=0)
Kolmogorov-Smirnov test
35
Boxplot
75th Procentile
75th Procentile
Mean( *)
Median
25th Procentile
25th Procentile
Outliers
Boxplot example
26,00
24,67
23,33
22,00
20,67
19,33
18,00
16,67
15,33
14,00
440
Inferential statistics
Confidence intervals
Hipotheses testing
39
Confidence intervals
Interval where the true value
most likely could occur.
40
X3, SD3; p3
X4; SD4; p4
, , p0
41
, p0
42
Confidence interval
Statistical definition:
If the study was carried out 100 times, 100
results ir 100 CI were got, 95 times of 100 the
true value will be in that interval. But it will
not appear in that interval 5 times of 100.
43
Confidence intervals
Xmin; Xmax
95% CI : p 1.96 SE
pmin ; pmax
44
SD
p
(
1
)
NN
Categorical data
(p)
45
46
Hipotheses testing
H0: 1=2; p1=p2; (RR=1, OR=1,
difference=0)
HA: 12; p1p2 (two sided, one
sided)
47
Hipotheses testing
Significance level (agreed 0.05).
Test for P value (t-test, 2 , etc.).
P value is the probability to get the
difference (association), if the null
hypothesis is true.
OR P value is the probability to get the difference
(association) due to chance alone, when the null
hypothesis is true.
48
Statistical agreements
49
Tests
Test depends on
Study design,
Variable type
distribution,
Number of groups, etc.
z test
t test (one sample, two independent, paired)
2 (+ trend)
F test
Fisher exact test
Mann-Whitney
Wilcoxon and others.
50
Inferential statistics
Summary
51
Inferential statistics
Summary