Sei sulla pagina 1di 17

Class 1 Outline:

Data as Evidence
1. Data as Evidence
2. Schools of Statistical Thinking
3. The Scientific Method
4. Biostatistics
• What is biostatistics?
• Role of biostatistics in public health
• Key ideas in statistical reasoning
• Biostatistics and the scientific method
5. The Biostatistics Paradigm
6. A Statistical Perspective on Cause
7. Example of Scientific Evidence from Studies:
Aceh Vitamin A Trial

1. Data as Evidence
• Associations between air pollution and mortality

• Benefits of clinical and community trials

• Drug approval decisions

• Descriptions of relationships between attitudes and


behaviors

• Which gene pathways regulate amyloid deposition?

2018 JHU Dept of Biostatistics 1. 1


2. Schools of Statistical Thinking
• Frequentist statistics
What is the probability of a wrong decision about the treatment
effect? What should we conclude from the observed data given a
specified null hypothesis?

• Bayesian statistics
What should we believe about the treatment effect
given the data that are observed?

• Likelihood inference
What is the evidence about the treatment effect
given the data that are observed?

3. The Scientific Method


Popper (1934) paradigm:
• Competing hypotheses about nature

• H0,H1,H2,H3, …., Hn
• Design a study and generate data

• Science is a process of eliminating hypotheses whose predictions


are inconsistent with observation.

Other paradigms
• Kuhn (1970): “Normal” science; paradigm shifts
• Big data: Science as description

2018 JHU Dept of Biostatistics 1. 2


3. The Scientific Method (cont’d)
• Often guided by “models”
– Simplifications of the world
– “Hypotheses”; Conceptual frameworks
– Example: Geocentric vs heliocentric models of the solar system
– Example: Dynamical systems view of physiology

Varadhan, Seplaki, Bandeen-Roche,


Xue, Mech Ageing Dev, 2008

Few models are “true” but


some are useful! (Box, 1976)

4. Biostatistics
• Biostatistics --
application of statistical reasoning and methods to the solution of
biological, medical and public health problems

• Biostatistics -- scientific use of quantitative


information to describe or draw inferences about
natural phenomena
– scientific -- accepted theory (ideas) and practice; ethical
standards
– quantitative information -- data reflecting variation in
populations
– inference -- to conclude or surmise from evidence

2018 JHU Dept of Biostatistics 1. 3


4.1 Role of Biostatistics in Science
1. Generate hypotheses: ask questions.

2. Design and conduct studies to generate evidence;


collect data.

3. Descriptive statistics: describe the distributions of


observations.

4. Statistical inference: assess strength of evidence in


favor of competing hypotheses; use data to update
beliefs and make decisions.

4.2 How Do Hypotheses Originate?

Act of creativity, informed by:

• Scientific background

• Prior observations

• Results of prior studies

• Scientific intuition

2018 JHU Dept of Biostatistics 1. 4


4.3 Design of a Study

• Ask a precise, testable and appropriate question

• Choose a research approach and design

• Define outcome of interest

• Define comparison groups

• Choose a population to study

4.4 Descriptive Statistics

• Also known as Exploratory Data Analysis (EDA)

• Organization and summarization of data

• Graphical display to visualize important patterns and


variation

• Hypothesis generating

10

2018 JHU Dept of Biostatistics 1. 5


4.5 Statistical Inference
• Also known as Confirmatory Data Analysis (CDA)

• Draw conclusions about a population (whole group; true


mechanism) from a sample (representative part of a
group; “trials”)

• Assess strength of evidence in support of competing


hypotheses

• Make comparisons
• Make decisions
• Make predictions

• Statistical inference uses data to surmise what is true or 11


likely to be true.

4.5 Statistical Inference (cont'd)


Probability

Observed Value for


Truth for
a Representative
Population
Sample

Statistical inference

12

2018 JHU Dept of Biostatistics 1. 6


4.6 Key Ideas in Statistical Reasoning

1. Natural laws do not perfectly predict all phenomena --


e.g., coin tossing, disease incidence.

2. Probability models are useful tools for representing


“tendencies” in the presence of variation.

3. Variation across people, over time and space, is itself a


natural phenomenon.

4. Variation leads to uncertainty about a particular event.

5. There are important patterns (tendencies, signals, laws)


to be discovered in the midst of variation.

13

4.6 Key Ideas in Statistical Reasoning (cont’d)

Combination of data with models is key to human


progress… (R Fisher, 1952 IBS address)

… and even freedom (R Kass, 2017 Fisher Lecture)

Real World Theoretical World


Scientific models
Data
Statistical models

Conclusions
14

2018 JHU Dept of Biostatistics 1. 7


4.7 Biostatistics and the Scientific Method

• Competing hypotheses:
H0 - coin has two tails (0 heads)
H1 - coin has one head and one tail
H2 - coin has two heads

• Study and data:


Toss the coin fairly
Data: Heads

• Data as evidence:
– “Heads” is impossible under H0; it is ruled out
– “Heads” is twice as likely under H2 than under H1;
data favor H2 over H1
15

4.8. Schools of Statistical Thinking, Revisited


• Frequentist statistics
How frequently would we observe data like ours if the coin were fair
(i.e. a particular hypothesis were true)?

• Bayesian statistics
How do the observed data alter my belief on the probability that the
coin is fair (i.e. a particular hypothesis is true)?

• Likelihood inference
“Evidence” of a fair coin given my toss is measured by the probability
of getting a heads if the coin is fair

Likelihood principle: All evidence regarding a hypothesis is contained


in the “likelihood function”--the probability model, under each possible
hypothesis, for the data actually observed.

16

2018 JHU Dept of Biostatistics 1. 8


5. Biostatistics Paradigm

Public health research attempts to discover simple


explanations for “how the world works”; in particular, the
inter-relationships among variables.

• explanations - hypotheses about mechanisms


• variable- a characteristic taking on different values
• simple - scientists prefer simple, rather than complex
explanations; Occam's razor; principle of parsimony
• inter-relationships- associations; causal connections
• Causal graphs as short-hand notation

17

5.1 Variables
• Variable - a characteristics taking on different values

• Random variable - a variable for which the values


obtained are usually thought of as arising partly as a
result of chance factors

• Response variable (Y) - the outcome measure; that


which is affected or caused; often a health measure

• Explanatory variables (X) - those which affect or cause


the response:
– Treatment (intervention) – explanatory variable that can be
controlled by the scientist.
– Risk factors - explanatory variables which influence the risk of
the outcome; scientific interest (e.g. smoking; salt intake); usually
cannot be controlled.
18

2018 JHU Dept of Biostatistics 1. 9


5.2 Kinds of Variables and Measurement
Scales
• Quantitative: concept of amount; numerical
– Discrete Variables: gaps in values; e.g., number of
births, number of drinks per week

– Continuous Variables: no gaps in values; e.g., blood


pressure, age, height, time to seroconversion

Special case: time to event data in which we need to


deal with “censoring”

19

5.2 Kinds of Variables and Measurement


Scales (cont’d)

• Qualitative: concept of attribute; categorical


– Nominal scale
• Binary or dichotomous: e.g., gender, alive or dead
• Polychotomous or polytomous: e.g., marital status

– Ordinal or ordered scale: e.g., ratings, preferences

20

2018 JHU Dept of Biostatistics 1. 10


5.3 Measurement Properties of Data
• “Variation” refers to the differences among a set of
measurements

• Where do these differences come from; why aren't all the


measurements the same?

• Natural vs. measurement variation


– Natural variation - Differences among persons
(experimental units) in the “true” values of the variable
of interest
– Measurement variation (or error) - Differences
between the measured and true values

21

5.4 Bias and Variance


• Bias - difference between the average (expected) value
of a measurement (variable) and the true value that it
targets

• Variance - variation among measurements about their


average or mean value, even if that mean differs from
the true targeted value

• In science and statistics, we continually balance bias and


variance.

• Mean squared error (MSE) = Variance + Bias2

22

2018 JHU Dept of Biostatistics 1. 11


6.0 Statistical Perspective on Cause
• When we say “Smoking causes lung cancer”, what do
we mean?

• Webster's New Collegiate Dictionary:


– Cause: something that brings about an effect or result

• Sir Bradford Hill's criteria:


– Strength of association
– Consistency
– Specificity
– Temporality
– Dose-response
– Plausibility

23

6.1 Counterfactual Definition of Cause


• Xi = 1 if person i is a smoker, 0 if not

• Yi(X) = 1 if person i has lung cancer, 0 if not

• Causal effect of smoking on lung cancer for person i


Yi(Xi=1) - Yi(Xi=0)

• Difference in lung cancer state if person i is a smoker


relative to if he is not

• Cannot observe both Yi(Xi=1) and Yi(Xi=0)  counter-


factual
24
• Missing data problem (Rubin, 1974)

2018 JHU Dept of Biostatistics 1. 12


6.1 Counterfactual Definition of Cause (cont’d)

Person X Y(X=0) Y(X=1) =Y(1)-Y(0)


1 0 0 ? ?
2 0 0 ? ?
3 0 1 ? ?
. . . . .
50 0 1 ? ?
51 1 ? 0 ?
52 1 ? 1 ?
. . . . .
25
100 1 ? 0 ?

6.1 Counterfactual Definition of Cause (cont’d)


• Use available data to “fill-in” or “impute missing
information

• Since a person cannot be used as his/her own control,


use another person who is as similar as possible
– Compare like-to-like

• Quality of data can be assessed by how accurately we


can “impute” missing information

• How similar are comparison groups for measured and


unmeasured other factors?
26

2018 JHU Dept of Biostatistics 1. 13


6.2 Approximating Causality
• Taking into account the factors (variables) that may differ
between comparison groups which may influence the
outcome.

• Confounders - other variables which need to be taken


into account when assessing the association of
treatments or risk factors with a response or outcome.

27

6.3 Effect Modification


• Taking into account that there may be an interaction
between two variables on an outcome (e.g. the effect of
a variable on an outcome is modified by the value of a
second variable).

• Effect modifiers - variables which identify subgroups of


people (units) across which the relationship of a
treatment (risk factor) and outcome will differ.

28

2018 JHU Dept of Biostatistics 1. 14


7. Example: Scientific Evidence in a Study
• Aceh Vitamin A Trial

• 25,939 preschool children in 450 Indonesian villages in


northern Sumatra

• 200,000 IU vitamin A given at 1-3 months after the


baseline census and again 6-8 months later

• Consider 23,682 out of 25,939 who were visited on a


pre-designed schedule

References
1. Sommer A, Djunaedi E, Loeden A et al. Lancet 1986
2. Sommer A, Zeger S: Statistics in Medicine 1991
29

7.1 Data as Scientific Evidence

• Randomized Community Trial of Children in 450 Villages

Alive at Alive at
12 mos. 12 mos.
Vit A Yes No Total
No 11,514 74 11,588

Yes 12,048 46 12,094

Total 23,562 120 23,682


• Does Vitamin A reduce mortality?
30

2018 JHU Dept of Biostatistics 1. 15


7.2 Data as Scientific Evidence (cont'd)

• Mortality Rates per 1,000 child-years


Mortality
Vitamin A Deaths/Child-Years Rate

No 74/11,588 6.4
Yes 46/12,094 3.8
Total 120/23,682 5.1
• Does Vitamin A reduce mortality?
• Calculate a risk ratio or “relative risk”: 3.8/6.4 = 0.59
• 40 percent reduction in mortality in the study group!
31

7.3 Remaining Questions

• There appears to be an association between Vitamin A


and reduced mortality.

• Does Vitamin A cause this reduction?


– The role of randomization is to balance other
observed and unobserved factors.
– The Aceh study is only one small part of the
Indonesian population.
– If the study was performed again, would we get the
same or similar results?
– Do we believe Vit A works? Should we declare this?
– How strong is this evidence that Vitamin A works?

32

2018 JHU Dept of Biostatistics 1. 16


7.4 Inference vs. Prediction

• Inference: Estimate the association between the


outcome of mortality and treatment, and characterize the
estimate’s uncertainty

• Prediction: Best predict the outcome of mortality based


on available data of treatment and other factors, and
characterize the prediction’s accuracy

• More later to distinguish these!

33

2018 JHU Dept of Biostatistics 1. 17

Potrebbero piacerti anche