Epidemiology Summary Notes

Epi demi ol ogy
Course notes for 2012

Schul i ch School of Medi ci ne and Denti str y
Uni versi ty of Western Ontari o
I ntroducti on to the Course
My goals are: (1) that you all gets 90s, and (2) that you learn to love epidemiology as much as I do.
Unrealistic goals, perhaps, and to get there Ill need your help: if ever there is something that you dont
understand, ask; if ever there is something youre curious to learn more about, ask.
From what I understand, the notes provided on WebCT are good enough for you to pass the course. Im
hoping that my notes will be good enough for you to ace the course. These are based on both the
objectives and the content of Dr. El-Masris lectures. They are not based on Dr. Champions lectures,
since Im not in London. Ive done my best to ensure that they are complete and cover the required
material.
Good luck!
Aidan Findlater
aidan@aidanndlater.com
SSMD Medicine, Class of 2015
Notes on the Notes
When a lecturer emphasized something that wasnt in the objectives, Ive marked it as NiO (Not in
Objectives) and will let you decide whether its worth studying.
Text that looks like this paragraph summarizes the main points relating to the objective. Its the main stuff
you need to know, based on the lecture, course notes, and my own knowledge.
Smaller, indented text that looks like this is extra information that might provide a better or more complete
understanding of the objective. Its similarly based on the lecture, course notes, and my own knowledge.
[A: Italicized text in brackets presents is my own editorializing. This is where Ill put stuff when maybe I
disagree with the lecturer or if Im not sure that theyd agree with me. You should be able to safely ignore
it when studying for the exam.]
Tabl e of Contents
Intro to EBM (9 Jan 2012) 1
Dene evidence-based medicine (EBM) 1
Describe the components of EBM 1
Describe the rationale for the use of EBM 2
Describe the evidence pyramid with respect to ranking of evidence sources 2
Understand basic study design principles [NiO] 2
Understand how background and foreground knowledge are used [NiO] 3
Provide an example of a proven therapy that is not used optimally 4
Provide an example of a harmful therapy that has been used in the past 4
Describe barriers to the use of EBM 4
Fundamentals of Epidemiology I (16 Jan 2012) 5
Describe the relevance of epidemiology (methods and results) to clinical methods 5
Dene epidemiology (Last, 2001) and understand the components of the denition 5
Dene, and apply to clinical information, the concepts of prevalence and incidence 5
Dene, and apply to clinical information, the concept of risk (probability) 7
Dene, and apply to clinical information, the concept of outcomes: case denition, case series, validity, and reliability 7
Dene, and apply to clinical information, the concept of exposures 8
Dene, and apply to clinical information, the contingency table 8
Dene, and apply to clinical information, measures of association 9
Dene and apply the concepts of epidemic and outbreak 11
Dene and apply the concept of epidemic curves with different shapes 11
Dene and apply the concept of surveillance 12
Dene and apply the concept of alternative explanations for a nding 12
Fundamentals of Epidemiology II (23 Jan 2012) 15
Summary of study designs 15
Describe the case-control study design, including advantages and disadvantages 17
Describe the cohort study design, including advantages and disadvantages 17
Describe the RCT study design, including advantages and disadvantages 18
Describe the ecologic study design, including advantages and disadvantages 18
Describe the time-series study design, including advantages and disadvantages 18
Describe the cross-sectional study design, including advantages and disadvantages 18
Describe the natural experiment study design, including advantages and disadvantages 19
Describe the major sources of bias that may occur in etiologic studies in humans, and apply this information to inter-
pretation of observational studies 19
Dene causation 19
Describe approaches to determining causation 19
Describe measures of mortality [NiO] 21
Fundamentals of Biostatistics I (30 Jan 2012) 22
Dene probability and odds 22
Classify different sampling approaches 22
Simple random sample (SRS) 23
Systematic sample with random start 23
Cluster sampling 24
Multistage sampling 24
Stratied sampling 25
Convenience sampling 25
Quota sampling 25
Differentiate between levels and types of measurement 25
Describe a normal distribution and compare to a skewed distribution 26
Understand the mean, median, mode, and variance, standard deviation, and range, and be able to calculate the mean,
median, mode, and range 27
Identify the most appropriate measurement for central tendency and dispersion for different levels of measurement
including interquartile range and standard deviation 28
Fundamentals of Biostatistics II (6 Feb 2012) 29
Distinguish between estimation and hypothesis testing 29
Interpret p-value and condence interval 29
Dene and interpret Type I error (alpha) 31
Dene and interpret Type II error (beta) 31
Dene and interpret power 31
Identify factors required for sample size and power calculations 31
Distinguish between a negative and an underpowered trial 31
Dene and interpret statistical interaction [NiO] 32
Interpreting multiple/multivariate regression [NiO] 32
Are the Results Valid? I (13 Feb 2012) 33
Compare and contrast observational (cohort, case-control and case series) and experimental studies 33
Dene study population and inclusion criteria 33
Dene randomization and allocation concealment and differentiate between the two 33
Dene block and stratied randomization 34
Dene blinding, and recognize studies where blinding may not be possible 35
Dene intention-to-treat analysis, and describe advantages 36
Understand what the CONSORT RCT reporting guidelines are 37
Are the Results Valid? II (27 Feb 2012) 39
Identify and interpret baseline data in a clinical trial 39
Identify attrition, and discuss possible effects on results of clinical trial 39
Compare and contrast efcacy and effectiveness, and internal and external validity 39
Dene bias, and recognize different sources of bias in studies, including publication bias 40
RCTs: What are the Results? I (5 Mar 2012) 41
Describe the problem of multiplicity in analysis, and apply this information with respect to interpretation of subgroup
analysis, multiple and secondary outcomes 41
Differentiate between primary and secondary outcomes, and apply this information to clinical trials 41
Dene composite outcome 42
Describe the valid use of composite outcomes 42
Describe rationale for using composite outcomes 42
Describe potential problems with subgroup analyses 42
Describe criteria for valid subgroup analyses 42
Interpret subgroup analyses in a clinical trial 42
Dene interim analysis 43
Describe reasons for early termination of clinical trials 43
Dene surrogate outcome, and recognize use of surrogate outcomes in a clinical trial as well as potential drawbacks of
the use of surrogates 43
Describe study phases in clinical trials 43
Describe problems of adverse event recognition including the use of the rule of three 43
RCTs: What are the Results? II (19 Mar 2012) 45
Differentiate between dichotomous and continuous outcomes 45
When provided with information from a clinical trial, develop a 2x2 table 45
When provided with information from a clinical trial, calculate and interpret the control event rate (CER) 47
When provided with information from a clinical trial, calculate and interpret the experimental event rate (EER) 47
When provided with information from a clinical trial, calculate and interpret the relative risk (RR) 48
When provided with information from a clinical trial, calculate and interpret the absolute risk reduction (ARR) 48
When provided with information from a clinical trial, calculate and interpret the relative risk reduction (RRR) 48
When provided with information from a clinical trial, calculate and interpret the number needed to treat (NNT) 48
When provided with information from a clinical trial, calculate and interpret odds ratio (OR) 49
Provide information regarding strengths and weaknesses of NNT 49
Critically appraise an article on therapy 50
Case-Control and Cohort Studies (26 Mar 2012) 51
Describe the purpose and structure of case-control and cohort study design 51
Describe the strengths and weaknesses of cohort and case-control studies 52
Recognize and describe types of bias that may occur 54
Recognize and describe confounding 55
Dene and calculate relative risk and odds ratio 56
Critically appraise a case-control study 56
Critically appraise a cohort study 56
Prognosis (2 Apr 2012) 57
Differentiate between risk and prognostic factors 57
Describe the elements of prognostic studies 57
Interpret a survival curve 58
Recognize potential sources of bias in cohort studies of prognosis 60
Diagnosis (9 Apr 2012) 61
Discuss the use of diagnostic tests clinically 61
Describe the characteristics and denitions of normal and abnormal test results 61
Develop a 2x2 diagnostic test result table when provided with data from a study of a diagnostic test 62
Dene and calculate sensitivity and specicity 62
Dene and calculate positive and negative predictive value 62
Dene and calculate prevalence 63
Apply the role of pretest probability or prevalence in interpretation of diagnostic test results 63
Interpret likelihood ratios 64
Interpret kappa 64
Interpret a receiver operating characteristic (ROC) curve 65
Critically appraise a study on a diagnostic test 65
Screening (16 Apr 2012) 66
Dene and differentiate between the three levels of prevention (primary, secondary, and tertiary) 66
Differentiate between screening and case-nding 66
Differentiate between diagnostic and screening tests 66
Describe criteria for a screening program 66
When provided with information about a screening test calculate sensitivity, specicity, positive predictive value (PPV),
negative predictive value (NPV) and prevalence 67
Describe the impact of prevalence of disease on the results of diagnostic or screening tests 67
Apply the impact of prevalence of disease to clinical situations 67
Dene and recognize lead-time, length-time and compliance bias 67
Discuss possible adverse effects of screening programs 68
The Interpretation of Statistical Results (23 Apr 2012) 69
Describe the difference between unadjusted and adjusted results 69
Interpret statistical ndings and the level of measurement of the outcome variable in linear, logistic, and survival analy-
ses 69
Describe the importance of describing sample characteristics in epidemiologic research 71
Describe various ways of selecting which variables should be included in a multivariate analysis 72
Non-regression statistical tests 72
Meta-Analysis (30 Apr 2012) 73
Dene and compare/contrast review, systematic review, and meta-analysis 73
Summarize steps required for a systematic review, including framing a specic question for review 74
Summarize steps required for a systematic review, including identifying relevant literature 74
Summarize steps required for a systematic review, including assessing the quality of the literature 74
Summarize steps required for a systematic review, including summarizing the evidence 74
Recognize the possible bias due to publication bias and describe approach to identifying publication bias using a fun-
nel plot 74
Interpret a forest plot 76
Describe benets and limitations of a meta-analysis 77
Dene heterogeneity 77
Recognize that heterogeneity may mean a meta-analysis is not feasible/valid 78
Interpret data from a cumulative meta-analysis 78
Describe the role of a sensitivity analysis 79
Communicating Risk (7 May 2012) 80
Describe effective risk communication as the basis for informed consent 80
Dene health literacy 80
Dene health numeracy 80
Describe patient perception of risk and the impact of health literacy and numeracy on patient risk perception and un-
derstanding 80
Describe cognitive biases that affect risk assessment and decision-making [NiO] 81
Outline the basic dimensions of risk 81
Identify techniques that have been shown to improve patient understanding of risk, such as verication techniques and
the roles of qualitative and quantitative and graphic presentations of risk, and decision aids 82
References 83
Appendix I The Student Guide to Research 85
Starting your project 85
Observational studies 86
Collecting the data 86
Analyzing the data 86
Writing it up 86
I ntro to EBM
(9 Jan 2012)
Dene evidence-based medicine (EBM)
Evidence-based medicine is the conscientious, explicit, and judicious use of current best evidence
in making decisions about the care of individual patients (1). EBM is not cookbook medicine.
Put another way: the integration of best research evidence with clinical experience and patient values to
facilitate clinical decision-making (2).
Generally, the current best evidence mostly comes from medical research, although there are other sources.
But remember, its extremely important to consider patient values when making decisions. EBM uses clinical
expertise to integrate the best research evidence, patient values, the health care system and available
resources, and the clinical setting.
EBM requires self-directed life-long learning. Reading journals and attending medical conferences are
important for keeping up to date with the current research, but the research must be evaluated critically. Just
because it was peer-reviewed doesnt mean it was done well.
Describe the components of EBM
EBM is a process with four steps:
1. Asking an answerable question. Answerable questions generally follow the PICO format. Specify
the Population in which the study will be done, the Intervention or exposure that will be applied, the
Control or comparison group (if applicable), and the Outcomes that are being investigated.
2. Tracking down the best available evidence to answer the question.
3. Critically appraising the evidence for validity and interpreting the results.
4. Integrating information with clinical expertise and the individual patient.
When asking an answerable question, remember the PICO format. Compare Is nicotine patch effective? to
Is nicotine patch better for smoking cessation than counselling alone in heavy-smoking adult Canadian
men?. The PICO wording of the research question often makes the best title for a research paper.
When tracking down the best available evidence, use guidelines published by national organizations (e.g.
Agency for Health Care and Quality), EBM-focused journals (e.g. EBM, EB Cardiovascular Medicine, etc.),
Epidemiology I Course Notes
1
systematic reviewers (e.g. Cochrane Library), and nally the primary literature (e.g. PubMed, ProQuest, etc.).
The systematic reviews and meta-analyses from the Cochrane Library probably provide the strongest form of
evidence. [A: I agree. Cochrane is great.]
Describe the rationale for the use of EBM
Between the health care we have and the care we could have lies not just a gap, but a chasm (3). [A:
EBM is presumably an attempt to bridge this chasm.]
Describe the evidence pyramid with respect to ranking of evidence sources
It can be useful to classify the reliability of studies based on their study designs. There are a variety to do
this, but the one discussed in the class is:
[A: 0. Systematic reviews and meta-analyses]
1. Experimental studies
1. Randomized controlled double-blinded studies
2. Randomized controlled studies
2. Observational studies
1. Cohort studies
2. Case-control studies
3. Case series
4. Case reports
3. Basic research and expert opinion
1. Animal research
2. Ideas, editorials, opinions
3. In vitro (test tube) research
[A: Double-blind is a loose term because it doesnt specify who was blinded. Depending on the study,
you can have triple- or quadruple-blind studies, and double-blind may not be enough.]
[A: Also, remember that the hierarchy is just a guideline! A well-designed cohort study is probably more
reliable than a poorly-design RCT, and same for cohort versus case-control.]
Understand basic study design principles [NiO]
[A: He provided an overview of the basic classication of study designs. Each is discussed at greater
length in later lectures, so Ive formatted the following as optional.]
There are two types of research: experimental and observational.
If the researcher assigns the exposure (e.g. you choose who to give the medication to and who gets placebo),
then it is an experimental design. If the exposure was not determined by the researcher (e.g. smokers
choosing to smoke, you didnt assign them to it), then the research is observational.
For observational studies, was there a comparison group? If yes, then you have a cohort study or case-control
study; if no, then its a descriptive study (generally a case series). Cohort studies compare outcomes between
Prepared by Aidan Findlater
2
exposed to unexposed groups (e.g. taking smokers and non-smokers and seeing who gets cancer). Case-
control studies compare exposures between outcome groups (e.g. taking cancer and non-cancer patients
and seeing who smoked). [A: for very good reasons that we will see later, case-control studies are considered
to be a weaker form of evidence because they are more easily biased.]
For experimental studies, they can be randomized (which is good) and blinded (which is good). They can also
be neither. Randomized clinical trials (RCTs) are not necessarily blinded. [A: You could also have a non-
randomized trial, but that would be very strange and suggests that you actually have an observational, not
experimental, study. Double-check who was determining the participants exposure status.]
Understand how background and foreground knowledge are used [NiO]
[A: I didnt understand what he was trying to say in lecture. The following mixes a bunch of sources,
including the lecture. Im not clear on whether this information is actually examinable.]
"Background" questions ask for general knowledge about a condition or thing. What causes
migraines? Theyre about getting basic information, not about decision-making.
"Foreground" questions ask for specic knowledge to inform clinical decisions or actions. In young
children with acute otitis media, is short-term antibiotic therapy as effective as long term antibiotic
therapy? They generally follow a PICO pattern.
Background questions are answered using your background knowledge: book learnin and basic stuff
you learn in textbooks and lectures. Foreground questions are answered using your background
knowledge plus extra research and critical thinking. Compare What is SARS? (background) to How
can we diagnose and treat SARS? (foreground).
When diagnosing, foreground thinking is evidenced by starting broad and then narrowing it down with
your investigations. Background thinking is more about starting with a specic diagnosis and then trying
to prove that its right (hypothesis-testing). Foreground thinking is much more open-minded and is
favoured, but you will always mix both.
3
Provide an example of a proven therapy that is not used optimally
It is estimated that 30-40% of patients (or higher) do not receive care according to present evidence.
[A: HAND-WASHING. What would Semmelweis do?]
Provide an example of a harmful therapy that has been used in the past
It is estimated that 20-25% of care that is provided is not needed or is harmful.
[A: Hormone-replacement therapy could be an example. Early studies suggested it was good for you,
and people were prescribing it like it was a prescription for Tic-Tacs. Later, better studies showed that it
was actually harmful. Now its used more carefully.]
Describe barriers to the use of EBM
Theres a time lag between research being done and physicians being aware of the research.
It can be used in the wrong patients in whom there is minimum benet or even harm, especially when
a practice is adopted quickly.
New information is not always accepted, acted on, or adhered to.
It requires both physicians and patients to prescribe or adhere to the treatment.

Changing behaviour is not easy.
The system can make it difcult to provide the best care, like insurance not covering the most
effective treatment.
These problems may be at the individual, team, or system level.
4
Fundamental s of Epi demi ol ogy I
(16 Jan 2012)
Describe the relevance of epidemiology (methods and results) to clinical methods
It allows us to investigate the etiology, diagnosis, prognosis, and treatment of diseases.
E.g. Outbreak investigation, which is effectively research with an investigative approach.
Epidemiology can be descriptive or analytic. Descriptive epidemiology is about person, place, and time.
Analytic epidemiology is about studying causation.
Dene epidemiology (Last, 2001) and understand the components of the
denition
The study of the distribution and determinants of health-related states or events in specied
populations, and the application of this study to the control of health problems (4).
The components are:
distribution: can refer to people, places, or time periods
determinants: causal factors that affect health
health-related states: persist over time (e.g. depression, hypertension, quality of life)
health-related events: point-in-time occurrences (e.g. heart attack, injury, hospital admission)
specied populations: you must clearly dene your study population
application and control: epidemiology is an applied science.
A clearer denition is perhaps: The study of how disease is distributed in populations and the factors that
inuence or determine this distribution (5).
[A: Or, as I tell people, epidemiologists try to gure out what makes us healthy or unhealthy.]
Dene, and apply to clinical information, the concepts of prevalence and
incidence
In order to compare between different countries, years, etc. we cant just count the number of cases. In order
for the measurement to be meaningful, we must also know the size of the population. For example, consider
5
1,000 cases of HIV in Canada compared to 1,000 cases of HIV in the US. The prevalence in Canada will be
much higher than the prevalence in the US because the US has a much larger population.
Point prevalence measures the number of cases that exist at a specic point in time divided by the size
of the population being studied (e.g. the percentage of the class who is currently, right now, experiencing
a cold).
Period prevalence measures the number of cases that exist during a specic period in time divided by
the size of the population being studied (e.g. the percentage of the class who has a cold at any point this
week, including those who are having a cold as the week starts).
Lifetime prevalence measures the number of people who have the outcome at any point in their life
divided by the size of the population being studied (e.g. the percentage of the class who will have a cold
at any point in their life, which should be around 100%).
When you think of prevalence, think of existing cases, both old cases that already exist and new cases that
start during the study period. This means that, every time you screen for prevalence for the rst time, it will be
high because you will see the already existing, old cases as well as the new ones.
When choosing which prevalence measure to use (point vs period vs lifetime), we must choose the right
measure. For short-lived, acute diseases, we can only really use point prevalence. Period and lifetime
prevalence are usually used for more chronic diseases. Prevalence cannot be used for studying etiology
(disease causation). It is useful for monitoring diseases and measuring diseases burden, and therefore helps
us understand how well we are managing diseases, especially chronic diseases. Its a good statistic for
politicians, policy-makers, and health statisticians. Comparing HIV prevalence in Canada to a developing
country may show a higher prevalence in Canada simply because the HIV patients are living longer, and are
therefore picked up in the prevalence estimate.
Cumulative incidence (or just incidence) measures the number of new cases that develop over a
specic period divided by the size of the population at risk (e.g. percent of people who get a cold at any
time this week out of those who have upper respiratory tracts). This can be displayed as a percentage.
That last part, at risk, refers to the population who could become cases. It trips people up, even proper
researchers, but it shouldnt. It just means that, if youre studying hysterectomies, you should only ever be
dividing by the number of women, since the research is only applicable to that population. If youre studying
uterine cancer, your population at risk only includes women who dont already have uterine cancer and who
still have their uterus (i.e. have not had hysterectomies).
Incidence rate or incidence density measures the number of new cases that develop over a specic
period divided by the total person-time. Person-time is a measure of the amount of time that each
study participant contributes, since each one may enter or exit the study at different times. If not
everyone in the study is followed equally, you must use incidence rate.
For example, lets say that you are conducting a year-long study (1 Jan 2011 to 31 Dec 2011) of the risk of
death after being diagnosed with colon cancer. Our population at risk is people with a colon cancer diagnosis
and our outcome is death. Each participant may enter the study (i.e. be diagnosed) at a different time and exit
the study (i.e. either die or leave for another reason) at a different time, so we know that we need to use
6
incidence rate instead of cumulative incidence. In this hypothetical study, we have three participants: one
participant who lives for three months after diagnosis and then dies; another who lives for six months after
diagnosis and then dies;#a third who had colon cancer at the start of the study and lives until the end without
dying. The participants will respectively have contributed three person-months, six person-months, and twelve
person-months, for a total of 21 person-months. With two deaths, the incidence density will therefore be two
new cases (i.e. deaths) divided by 21 person-months, or 0.095 deaths per person-month.
[A: The incidence rate is not the hazard rate. Not sure what our prof was saying about that, but feel free to
ignore it for now. Well see hazard rates again later.]
Incidence rates are often multiplied by an arbitrary factor, just to make the number easier to think about.
The above example would probably be better reported as 95.2 cases per 1,000 person-months.
Prevalence is useful for measuring chronic health states and burden of disease, and for health care
planning and funding. Its bad for studying causation.
Incidence is useful for measuring health-related events and for studying causality, but it can be very
difcult to nd new cases.
You should understand prevalence and incidence in order to understand health data that you will see. For
example, if prevalence of HIV in Canada is 2% and in South Africa is 1%, you should ask about the incidence.
Perhaps the cumulative incidence in Canada is 0.5% and in South Africa is 2.5%. In that case, you can
conclude that people with HIV in South Africa are dying at a far greater rate than those in Canada, and that
South Africa therefore has a greater HIV problem than Canada.
Prevalence can be increased by: increased incidence, decreased mortality, increased duration of
disease, or increased case-nding (e.g. if you start a screening program).
[A: Prevalence ~= Incidence x Duration, where duration is determined by the time until either cure or death.
Everything else follows from this simple equation.]
With regards to increased case-nding, any country that starts a good screening program will have higher
prevalence than a country with poor screening. The higher prevalence in that case just means that the well-
screened country is catching the cases sooner than the poorly-screened country.
Dene, and apply to clinical information, the concept of risk (probability)
The probability that an event will occur, e.g., that an individual will become ill or die, within a stated
period of time or age (4). Incidence is a measure of risk for the disease.
Dene, and apply to clinical information, the concept of outcomes: case
denition, case series, validity, and reliability
A case denition is a set of diagnostic criteria that must be fullled in order to identify a person as a
case of a particular disease (4). They are based on clinical signs/symptoms, laboratory tests, a
combination of both, or a scoring system.
7
Case denitions often exist to identify or delineate cases based on the certainty of the diagnosis. They
are usually broken down into denite/conrmed, probable, and possible/suspected cases (from
strongest to weakest denition). For each, a patient must meet specic criteria for the given level of
certainty. The rst two are often used to for research because they have higher signal-to-noise, while the
latter is more useful for making public health decisions where you might want to err on the side of
caution).
Case denitions should be valid (i.e. it measures what you think it measures) and reliable (i.e. it works
the same when applied by different people [inter-rater reliability]).
Every time you read a research report, the author must describe how they dened a case. For example, a
device for detecting atrial brillation requires a case denition of atrial brillation (is it 1 second, 5 seconds,
etc.). If you do not have a standardized case denition, you cant really share your results because your
colleagues cannot interpret the results. For any research, a case denition has to be provided and has to be
convincing. One useful source of case denitions for hospital complications is the CDC, which has specic
criteria for each hospital-acquired infection. Using standardized denitions makes research into nosocomial
infections more generalizable from one hospital to the next.
Dene, and apply to clinical information, the concept of exposures
Exposure is any independent variable that could cause the disease. It can be physical (e.g. radiation),
chemical (e.g. tobacco smoke), biological (e.g. genetics, infection), or sociological (e.g. gender, race/
ethnicity, socioeconomic status). It can also be combinations of these, as in occupation.
Dene, and apply to clinical information, the contingency table
The 2x2 table (two-by-two table) is the simplest form of investigating disease-exposure associations. To
investigate the effect of an exposure on a disease outcome, take a bunch of people who dont have the
disease and follow them. Some of the people you follow will have been exposed to whatever youre
studying and some wont have been exposed. After following them for a while, some will develop the
disease or outcome that youre interested in. You can then break the population down into a 2x2 table.
Disease + Disease -
Exposure +
Exposure -
a b total exposed = a + b
c d total unexposed = c + d
total population = a + b + c + d
[A: Know this table. Love this table. Write it out yourself and get comfortable with it.]
The incidence of a disease is the number of people who develop the disease divided by the number of
people who were at risk of developing it at the start of the study. Were interested in seeing if the
incidence, or risk, is higher in the exposed group compared to the unexposed group. We can easily
calculate the exposure-specic risks from our 2x2 table:
8
incidence (or risk) of disease in exposed = =
incidence (or risk) of disease in unexposed = =
incidence (or risk) of disease in total population = =
Sometimes, we want to look at the odds of a disease instead of the risk of it. In a casino, 2-to-1 odds of
winning means that you will win 2 for every 1 you lose, or 2 positive outcomes for every negative
outcome. This corresponds to a 2/3 probability of wining. Similarly, the odds of disease are calculated by
dividing the number of people with the disease by the number of people without it. Again, we can easily
calculate these numbers from the 2x2 table:
odds of disease in exposed = =
odds of disease in unexposed = =
[A: Odds are odd. We hate them, but theyre a lot easier to work with for certain things. Note that, if the
disease is rare, b will be much, much larger than a, and so a/(a+b) will be very close to a/b. The same also
holds for the unexposed, so that if a disease is rare, d will be much, much larger than c, and c/(c+d) will be
very close to c/d. That is to say, the odds approximate the risk, given that the disease is rare.]
Dene, and apply to clinical information, measures of association
A measure of association is a calculation that we do to see how strongly an exposure correlates with
an outcome.
Relative risks: risk ratio and odds ratio
A risk ratio is a ratio of risks; an odds ratio is a ratio of odds. The ratios were interested in are the risk or
odds of disease in the exposed group compared to the risk or odds of disease in the unexposed group:
risk ratio or relative risk (RR) = = =
odds ratio (OR) = = =
Although they mean different things, the general interpretation is the same for both measures.
9
If relative risk or odds ratio is less than one, it indicates that people
who are exposed are less likely to get the outcome than those who
are not exposed. The exposure is protective.
If the relative risk or odds ratio is equal to one, it indicates people
who are exposed are as likely to get the outcome as those who are
not exposed. The exposure has no effect on the outcome.
If the relative risk or odds ratio is greater than one, it indicates that people who are exposed are more
likely to get the outcome than those who are not exposed. The exposure is a risk factor.
For example, if the incidence of cancer in smokers is 20% and in non-smokers is 10%, the relative risk of
cancer for smokers versus non-smokers is 2, and therefore smokers have twice the risk of cancer as non-
smokers.
But for many studies, we cant study the whole population at risk, and therefore cant calculate risks or
risk ratios. However, we can often still calculate odds ratios. [A: We will see more of this when we get to
case-control studies.]
For example, studying HIV incidence in homeless people, we cant follow all homeless people (that would be
impossible!). Instead, we get a sample of people who develop HIV and a sample of people who do not, and
we nd out how many were homeless in each outcome group. By doing this, we are xing the number of
people in each of the disease groups and our 2x2 table no longer represents the actual population. If we pick
100 diseased participants and 100 non-diseased participants, that doesnt mean that the risk of the disease is
50%, because were xing how many people are in each disease category. In this case, a+b and c+d are not
the population sizes of the exposure groups. We can no longer calculate incidence from our 2x2 table.
Instead of incidence, we calculate an odds ratio given by the odds of exposure in the diseased participants
divided by the odds of exposure in the non-diseased participants. As it turns out, this gives exactly the same
result as calculating our normal OR.
odds of exposure in diseased group = a / c
odds of exposure in non-diseased (control) group = b / d
odds ratio = odds in exposed / odds in unexposed = [a / c] / [b / d] =
Notice how this is the same as if we were calculating the odds ratio based on the odds of disease for each
exposure group. Wow! So cool! So it doesnt matter that we dont have a representative population sample,
the odds ratio is the same as if we had.
In any case-control study where youve xed the number of people in the disease and non-disease groups,
you cannot report a risk ratio, only an odds ratio. El-Masri and I will hunt you down if you do.
However, the interpretation of the OR is slightly different than the RR. An RR of 2 means that the risk is
doubled in the exposed group compared to the unexposed group; an OR of 2 means that the odds are
10
Interpreting relative risks
OR < 1 or RR < 1 => protective
OR = 1 or RR = 1 => no effect
OR > 1 or RR > 1 => risk factor
doubled. [A: Never say that an OR of 2 means that the exposure doubles your risk of an outcome; it
doubles your odds of it.]
[A: But now think about the fact that the odds approximates the risk for rare diseases (see my explanation in
the 2x2 table section, above). Because of this, the OR approximates the RR when the outcome is rare.]
True rates and rate ratio
The rate here refers to the incidence rate. A rate ratio is therefore a ratio of incidence rates. [A: I prefer
to be more specic and call these incidence rate ratios, or IRRs.]
rate ratio or incidence rate ratio =
The general interpretation of IRRs is the same as for RRs and ORs:
IR < 1 => protective
IR = 1 => no effect
IR > 1 => risk factor
But again, the specic interpretation is a little different. An IRR of 2 means that the exposure doubles the
incidence rate of the outcome, not the risk of it.
Dene and apply the concepts of epidemic and outbreak
Epidemic: The occurrence in a community or region of cases of an illness, specic health related
behavior, or other health-related events, clearly in excess of normal expectancy (4).
You all know it: more disease than we expect.
Outbreak: [A]n incident in which two or more individuals have the same disease, have similar
symptoms, or excrete the same pathogens; and there is a time, place, and/or person association
between these individuals. (FDA)
Basically, an epidemic just means more disease than expected, while an outbreak means that the cases of
disease are somehow related.
Dene and apply the concept of epidemic curves with different shapes
A graphical plotting of the distribution of cases by time of onset (4). [A: When a case comes in, you
ask when their symptoms started and use that. Time of onset, not time of report.]
11
Point source: steep left-hand slope, outbreak lasts for about one incubation period. Its a one-time
event, like a picnic, and only shows one peak.
Intermittent common source: shows several, individual peaks. Its from a single source that is only
exposing people intermittently, such as a contaminated cafeteria thats only open once every two weeks.
Continuous common source: right-hand is gradual if the epidemic runs its course, or sudden if control
measures are implemented. Its from a single source that is continuously exposing people, such as a
contaminated cafeteria thats open all the time, so youll see a constant, steady supply of sick people.
Propagated (progressive source): multiple peaks. Basically, its a bunch of point source outbreaks as
people become infected and then spread it to another group. Peaks are about one incubation period
apart [A: although a large outbreak would have them all running together into a giant mess]. The easiest
example is an STD like chlamydia.
Dene and apply the concept of surveillance
The systematic and continuous collection, analysis and interpretation of data, closely integrated with
the timely and coherent dissemination of the results and assessment to those who have the right to
know so that action can be taken (6).
[A: Thats a ridiculous denition. Porta is way too wordy.]
Surveillance is essentially descriptive epidemiology over a long time.
Dene and apply the concept of alternative explanations for a nding
[A: Or, as I call it, Everything you thought you knew is wrong. Observational studies are always subject
to confounding, and even experimental studies can be explained away.]
Confounding and other biases
Bias is the systematic deviation of results or inferences from the truth (6).
12
Bias is systematic. If its not systematic, its not bias (its random chance).
Confounding is a type of bias. Heres the awful denition given in the notes: Distortion of the estimated
effect of an exposure on an outcome, caused by the presence of an extraneous factor associated both
with the exposure and the outcome, i.e. confounding caused by a variable that is a risk factor for the
outcome among nonexposed persons, and is associated with the exposure of interest, but is not an
intermediate step in the causal pathway between exposure and outcome (4).
Breathe, guys. If an exposure is associated with an outcome, you have to consider the possibility that
something else is causing both the exposure and the outcome to be positive in the same individuals (and that
the exposure isnt actually affecting anything).
My favourite example is this: lighters (those things that make ames) are associated with lung cancer. Is this
relationship causal? No! The relationship is confounded by smoking status. That is, smokers are more likely
than non-smokers to own lighters and are also more likely to get lung cancer. Easy, right?
A slightly more nuanced example is that some researchers found an association between coffee consumption
and heart disease. Gotta stop drinking coffee, right? Wrong! The results were later explained as confounding
by smoking. A smoker is more likely than a non-smoker to drink coffee. A smoker is also more likely to get
heart disease. As soon as you adjust for smoking status, the association between coffee consumption and
heart disease disappears. You dont have to give up your coffee, just your cigarettes.
Confounding is a problem in all observational research. No matter what potential confounders you adjust for,
there will always be residual confounding that you havent accounted for. [A: Confounding is really only a
problem when talking about causation. If youre developing predictive models, then who cares if the
association is confounded or not? Its still a real association. If all you have is data on which people own
lighters, you can still develop a model that predicts who will get lung cancer, it just wont be quite as good.]
[A: My favourite form of bias is called confounding by indication, which sort of means reverse causality.
Looking at data on aspirin use and risk of heart disease, you will see an association. Does this mean that
aspirin causes heart disease? No! People with heart disease often take aspirin. In this case, their heart
problems are causing the aspirin use, not the other way around. The aspirin merely indicates the presence of
existing problems, and is probably actually protective.]
13
Chance
All numbers we come up with in our studies are estimates, and no matter how high your estimate, it
could always be due to chance. Some ways of assessing this possibility are p-values [A: which suck,
never use them!] and condence intervals.
Condence intervals give you a measure of how precise your estimate is. Thinks of a newspaper giving a
statistic, saying, X plus or minus Y, 19 times out of 20. The condence interval goes from X-Y to X+Y,
and the 19 out of 20 means its a 95% condence interval. If you have a small condence interval, you
have a more precise estimate. [A: Remember, thats different from having an accurate estimate.] You can
get a small condence interval by either having a large sample size or a strong effect.
A p-value is about hypothesis-testing. [A: I dont like p-values or hypothesis-testing and wont discuss
them unless you guys want me to. El-Masri didnt, so hopefully that means it isnt on the test...]
Stratied analysis
When you stratify, you separate the data based on a potential confounder before doing your analysis.
Basically, you take your original 2x2 table and make two more: one of people who have the confounder
and one of people who do not. You then calculate two separate ORs or RRs, one for each of the strata.
14
Fundamental s of Epi demi ol ogy I I
(23 Jan 2012)
Summary of study designs
15
Study Design Description Advantages Disadvantages
case-control Find people with disease;
nd similar people without
disease. Ask them about
previous exposures and
compare the groups rates
of the exposure.
Relatively fast and easy.
Good for rare diseases. If
done well, can
approximate the results of
a cohort study.
Easily biased. Only gives
odds ratios.
cohort Find people with
exposure; nd similar
people without exposure.
Follow the groups for a
while and compare them
for outcome.
Less risk of bias than
case-control.
Confounding is a problem
even in well-designed
cohort studies. Often
requires long follow-up.
RCT Randomly split your
participants, give one
group the exposure and
keep one group as
controls. Compare the
groups for outcome.
Causal inference is easy.
No risk of confounding.
Can be expensive. Many
exposures cannot be
ethically randomized.
ecologic Compares groups at an
aggregate, population
level.
Quick and easy. Same as for cohort, with
the additional risk of
cross-level bias/ecologic
fallacy.
time-series Analyzing changes in a
single measurement over
time. [A: Silly denition.]
Quick and easy. Cant test hypotheses. [A:
Except you can!]
cross-sectional Sample the population at
a specic point in time.
Quick and easy. Cant tell whether
exposure happened
before outcome, because
its all sampled at the
same time.
natural experiment Look for something where
people or groups get
different exposures due to
circumstances outside
their control. Compare
them for outcomes.
Can be powerful if done
right.
Not actually randomized,
so still risks bias.
[A: I hate their denition of time-series. Its useless. Dont remember it except for the exam.]
16
Describe the case-control study design, including advantages and disadvantages
In a case-control study, youre nding people who have a disease (cases) and comparing them to people
who dont (controls). Youre grouping by outcome, and youre comparing the groups for differences in
exposure rates. With case-control studies, you can only calculate odds ratios, never risk ratios.
For example, say youre studying how cell phone use affects risk of brain cancer. Its a rare outcome, so you
probably wont do a cohort study since youd only pick up a small number of brain cancer cases. You do a
case-control study instead, since its faster and works well for rare diseases. You go to the local cancer centre
and actively recruit people who have brain cancer (the cases). For every case you recruit, you nd one or two
people who are similar age and sex but who dont have brain cancer (the controls). [A: I wont give details on
nding the controls because sampling controls to minimize bias is hard. Ask an epidemiologist.] Now you take
the two groups and you assess their past cell phone use, asking them things like whether they have had a cell
phone in the past, how long theyve had it for, and often much they used it. If the people with brain cancer
were more likely to have used a cell phone in the past than the controls, then you would conclude that cell
phone use is associated with brain cancer.
[A: An odds ratio approximates a risk ratio when done properly. You can even do your control-group sampling
in such a way that the odds ratio approximates the incidence rate ratio. Cool, right?]
[A: And heres a bias thats huge with case-control studies: recall bias. Even if theres no effect, people with
brain cancer are more likely to report cell phone use because (a) they care more than controls and therefore
think longer and harder about their exposures, and (b) they suspect a link and are more likely to report positive
exposure status even when they had minimal exposure simply because they suspect that it was the cause of
their disease.
Imagine someone is told that they have brain cancer and then asked about cell phone use. Compare that to
someone who just had a toe amputated and is asked about cell phone use. Which is more likely to report cell
phone use? The brain cancer person.
Recall bias isnt just theoretical, either. Epidemiologists have run studies to assess it, but comparing self-
reported cell phone use to actual cell phone carrier data.]
Describe the cohort study design, including advantages and disadvantages
In a cohort study, youre nding people who have an exposure and comparing them to people who
dont. Youre grouping based on exposure, and comparing the groups for differences in outcome rates.
Youll take a population that includes both exposed and unexposed people who dont have the disease at
baseline, then following up after a while. Some of them will have developed the disease youre interested in,
and some wont. The analysis compares the rates of the outcome/disease between the exposed to
unexposed groups.
A famous example is the Framingham Heart Study, where researchers followed almost an entire town,
periodically sending questionnaires to assess a bunch of different exposures, and waited for people to develop
or die from cardiovascular disease. Then they compared those with CVD to those without it, for a whole bunch
of the exposures they were tracking. As it turned out, smoking was bad for you! Cohort studies like
Framingham provide some of the strongest evidence we can get for exposures like that, since you cant
randomize people to smoke or not smoke.
17
Describe the RCT study design, including advantages and disadvantages
Randomize people to exposure or not exposure. Compare for differences in outcome rates. Pretty direct
causal inference, if randomized and blinded properly, but RCTs are usually expensive and time-
consuming.
[A: Randomizing and blinding can easily go awry, as can data analysis. Ask an epidemiologist.]
Describe the ecologic study design, including advantages and disadvantages
A study in which the units of analysis are populations or groups of people, rather than individuals (4).
For example, breast cancer rates have been positively correlated with per capita fat consumption across
countries. A major limitation with ecologic studies is cross level bias (or the ecologic fallacy), which
would occur, for example, if women with breast cancer who lived in countries with a lot of fat consumption
actually had low fat diets, and vice versa. Because there is no way to study these individual exposures and
outcomes using group-level data, this design is better for generating than testing hypotheses.
Describe the time-series study design, including advantages and disadvantages
Analyzing an outcome as it changes over time. Data are collected continuously or periodically.
Because many health events are recorded by calendar time, it is possible to study trends in these health
events. Time series are used descriptively to generate hypotheses. These analyses can indicate whether an
outcome is increasing or decreasing.
For example, a Canadian study noted a decrease in hospital and Emergency Department (ED) visits for
diabetes complications over a 5 year period. They arent testing a hypothesis, theyre simply noting a trend.
Time-series data can also be examined for seasonal trends that might indicate a potential cause (e.g. due to
viral exposures, seasonal dietary constituents, or air pollution due to coal burning).
They can also be compared before and after a specic date when a health policy change was instituted, as
another piece of evidence that an exposure was a cause. For example, toxic shock syndrome cases before
and after Rely tampons were withdrawn, and Reyes Syndrome before and after warning labels were put on
aspirin, both provide strong visual evidence that the outcome was reduced after exposure was reduced. Note
that time series can also lead to erroneous conclusions. All changes in trends coincide with some event, many
purely by chance.
[A: Ive never heard of a time-series study. What theyre talking about here is simply a descriptive study using
time series data, but you can also do an analytical study using time series data. To do that, youd take two
time series, one for exposure and one for outcome, and look for correlations. If the exposure goes up, does
the outcome as well?]
Describe the cross-sectional study design, including advantages and
disadvantages
A study that examines the relationship between diseases (or other health related characteristics) and
other variables of interest as they exist in a dened population at one particular time. ... The temporal
sequence of cause and effect cannot necessarily be determined in a cross-sectional study (4).
18
An obvious example of the limitations of a cross-sectional study would be if an association were found
between obesity and depression in a group of high school students, because it would be impossible to know
whether the obesity led to the depression, the depression to the obesity, or both were caused by a third factor.
Describe the natural experiment study design, including advantages and
disadvantages
Naturally occurring circumstances in which subsets of the population have difference levels of exposure
to a supposed causal factor, in a situation resembling an actual experiment where human subjects would
be randomly allocated to groups (4).
While not a true randomized experiment, there are instances where an exposure occurs to members of a
population in an essentially random way. An area similar to the exposed area is then selected as the
unexposed group. If the case can be made that the exposure was essentially at random, then the study can
yield valuable information that would otherwise not be available. For example, the Nagasaki and Hiroshima A-
bomb explosions exposed residents of those two cities to ionizing radiation. It is possible to estimate the
radiation exposure of survivors based on where they were during the explosions. Then, cancer rates have
been compared between survivors of these cities and similar cities chosen from elsewhere in Japan. If the
case can be made that the exposed and unexposed cities had similar cancer rates before the bombs, the
opportunity exists to learn about the long-term effects of ionizing radiation across an exposure gradient not
usually seen. This is a very strong design if the circumstances allow it.
Describe the major sources of bias that may occur in etiologic studies in humans,
and apply this information to interpretation of observational studies
[A: Im not really sure what this objective is trying to get at.]
Broadly speaking, biases are either selection bias, from the way youre recruiting your participants, or
measurement bias, from the way youre measuring the exposures or outcomes. Remember, it must
introduce a systematic error to be a bias.
An example of selection bias would be if you were to use a breast cancer screening booth to estimate
prevalence of breast cancer. People who are more worried about having breast cancer are more likely to self-
select for participation, so your estimates of prevalence will be high.
An example of measurement bias is the recall bias that I described in the case-control section, above.
Dene causation
There is no single absolute rule that can be used to decide if an exposure or event E causes an outcome
O. The denition they ask us to think about is this:
E occurs. Later, O occurs. Had E been absent, O would not have occurred, all else being held constant.
Describe approaches to determining causation
Counterfactual
Counterfactual: A measure of effect in which at least one of two circumstances in the denition of
variables must be contrary to fact (4). [A: What a BS denition.]
19
This just means that you imagine a world where you smoke and you compare it to a world in which you dont
smokeeverything else is identical except for the smoking. Then you see what the differences are in
outcome. If you get lung cancer in the world in which you smoke, then the smoking caused the lung cancer
since it was the only difference between the two worlds. Obviously, it is impossible to actually do this
comparison, hence counterfactual.
Causality in reality
Points that they want you to know:
You cant prove causation in an individual. For example, theres no way to prove that a specic person
got their lung cancer by smoking.
RCTs are the best tool we have for establishing causation.
You cant ethically randomize someone to something that you think will cause harm, so RCTs are
limited.
Observational studies are susceptible to bias.
Observational studies are often the rst, and sometimes the only, evidence we have.
Observational studies often give the same results as RCTs.
In medicine, we must be practical. Even if we cant prove causation, we can still gure out what the
potential risks and benets are, and act accordingly. Even if we dont have perfect evidence, it might
still be prudent to act.
Kochs postulates
Specic to infectious disease.
the suspected agent must be present in all cases
must not be found in other cases of disease
is capable of reproducing disease in experimental animals
must be recoverable from the experimental animals after the disease was reproduced
Bradford-Hill Criteria
These are guidelines, and are not always all necessary. Except for temporality.
1. Strength of association (relative risk, odds ratio): the larger the apparent effect, the more likely it is to
be causal [A: This one is really debatable. Were often looking for effects that are very small.]
2. Consistency: do we see the same results over and over again, in different studies?
3. Specicity: it only produces one specic effect of interest [A: Again, very debatable. Smoking does a
lot of bad stuff to you besides just lung cancer.]
4. Temporal relationship (temporality): the cause must come before its effect
5. Biological gradient (dose-response relationship): if you increase the exposure, does it increase risk of
the outcome?
6. Plausibility (biological plausibility): theres a plausible pathophysiological mechanism of effect
20
7. Coherence: the effect is compatible with existing theories and knowledge [A: This is similar to
biologic plausibility but broader.]
8. Experiment (reversibility): lowering the exposure lowers risk of the outcome
9. [A: Analogy: Could other things explain it? This one isnt in the notes, but was one of the original
criteria.]
Describe measures of mortality [NiO]
death rate or mortality rate =
case-fatality rate =
Note that the denominator gives the number of cases who die from the disease. That is, the deaths must be
attributable to the disease, and doesnt include those who were, say, run over by a bus.
proportionate mortality =
This is just telling you how much of the mortality is caused by a given disease. For example, of all the deaths in
Hotel Dieu in 2011, how many (proportionally) were caused by CHF?
21
Fundamental s of Bi ostati sti cs I
(30 Jan 2012)
Dene probability and odds
Probability is the proportion or percentage of successes out of the total number of trials. Odds are the
ratio of successes to failures.
Take a coin ip. You have a probability of getting heads of 1/2 (50% or 0.5), which corresponds to an odds of
1/1 (or just 1). If you have a six-sided die, your probability of rolling a one is 1/6 while your odds of rolling a one
are 1/5.
More mathematically, if you have N trials, with S successes and F failures, then:
N = S + F
probability of success = P(S) = S / N = S / (S + F)
odds of success = Odds(S) = S / F
[A: Why are they asking you to dene probability and odds so long after you were expected to know risk ratios
and odds ratios? I dont know.]
Classify different sampling approaches
Sampling scheme Description Advantages Disadvantages
Simple random sample Every unit has the same,
non-zero probability of
being selected.
Its the ideal sampling
technique and will not add
any bias to the study.
It requires you to have a list
of all units in the
population, which isnt
always feasible.
Systematic sample with
random start
For some number N, take
every Nth unit on the list,
starting at random unit
between the rst and Nth.
Easier than a SRS in
certain situations.
If theres any clumping,
your sample might not be
representative.
Cluster sampling You sample groups (like
neighbourhoods), where
you pick a random set of
neighbourhoods and talk to
everyone in each sampled
neighbourhood.
Efcient for sampling larger
populations, especially
when you have to actually
go out and collect the data
(e.g. walk from block to
block).
Since its not a perfectly
random sample of people,
youll need to include more
people to get the same
statistical power as a
simple random sample. The
number by which the
sample size must be
multiplied is known as the
design effect.
22
Sampling scheme Description Advantages Disadvantages
Multistage sampling You sample in multiple
levels or stages. For
example, a random sample
of cities, then within each
city, a random sample of
neighbourhoods, then
within each neighbourhood
a random sample of
houses, and nally within
each house a random
sample of the occupants.
Best for sampling really
large populations.
Same as for cluster
sampling. Multistage also
sampling makes running
and analyzing a study more
complex. Analysis must
use sampling weights.
Stratied sampling Samples within dened
strata of the population.
Can oversample from
certain strata.
Oversampling ensures that
enough people are
sampled from minorities of
interest.
Requires you to be able to
stratify the population.
When oversampled,
analysis requires weighting.
Convenience sampling Take whomever you can. Quick and easy. Since its a non-random or
non-probability sample,
you really cant generalize
to the broader population.
Quota sampling Like stratied sampling
where the strata are
sampled by convenience
rather than randomly.
Quick and easy. Since its a non-random or
non-probability sample,
you really cant generalize
to the broader population.
Simple random sampling is the ideal sampling method, and most statistical analyses assume that your
samples were SRS. If you dont use SRS, you often have to adjust your analysis using sampling weights.
The non-probability sampling methods are easily biased and often not generalizable.
Simple random sample (SRS)
This is your classic probability sampling technique. First, you enumerate all possible units, that is, make
a list of all the things in the population youre sampling from. This list is your sampling frame. Then you
take a completely random sample of units from that list. Every unit has an equal and non-zero
probability of being selected for the sample.
For example, drawing 10 students names out of a hat would give a simple random sample of students. In a
class size of 38, everyone has the same 10/38 = 26% chance of being selected. The list of students names
that youve torn up and put in the hat is the sampling frame.
SRS is the ideal that all other sampling techniques are trying to approximate. All of our statistics assume
that our samples are generated in this way, which is why we need things like the design effect and
weighting for cluster or multistage samples.
Systematic sample with random start
As above, you start with your sampling frame. Then you pick a number, which Ill call N. Start sampling,
on the list, at a random position between the rst and Nth item. Then take every Nth unit after that until
you get to the end of the list.
23
E.g. Take a list of every student in the class (your sampling frame). every sixth person, starting at random
somewhere between the rst and sixth person on the list.
This technique can be useful for things like sampling houses from a block. Instead of creating a list of all the
houses and consulting a random number table, you simply roll a six-sided die, start at that house, and then
take every sixth house after it.
If the sampling frame has any clustering or clumping of data, then your sample may not be
representative of the population.
Cluster sampling
Instead of randomly sampling units, you randomly sample groups of units. For example, instead of
sampling people, you randomly select neighbourhoods and include everyone in each of the sampled
neighbourhoods. You might be sampling 20 neighbourhoods of 500 people each, but thats not the same
things as sampling 10,000 people at random. In order to get the same statistical power, you need to
increase the sample size by some multiple, called the design effect. In a sample with a design effect of
two, you would need to include twice as many people as if you had sampled them perfectly randomly.
For example, were interested in studying med students in Windsor. We pick two years at random and
interview everyone in those two years.
Multistage sampling
This is like cluster sampling, except instead of including every unit in the sampled groups, we take a
sample of them. This has the same design effect as cluster sampling, but analysis is much more
complicated and requires sampling weights. Distrust any multistage sampled study that does not report
using sampling weights.
You can have several layers or stages in multistage sampling.
For example, were interested in studying med students in Windsor. We pick two years at random and
interview 30% of students in each of the two years.
Or, a more complex example, say you want to sample all Canadians. You select 50 cities at random, then
within each city you select (at random) three census tracts, then four blocks within each census track, and
twelve houses within each block. By weighting your results appropriately, you can generalize from this sample
to the entire Canadian population.
[A: I think its important to point out that, for cluster and multistage sampling, you arent picking your
groups randomly, but rather are picking them with probabilities proportional to size. In the Windsor
med student example above, the probability of picking the fourth year class would be lower than the
probability of picking the rst year class, because the rst years have more people. If you dont do that,
then your weighting wont work out and you cant generalize.]
24
Stratied sampling
Stratify the population by some characteristic (age, sex, or whatever) and sample randomly within each
stratum. This allows you to oversample from specic strata, which will need to be accounted for by
weighting the results in the analysis.
For example, if your study population only has a few young people but you want to be able to make inferences
about them as a subgroup, you might stratify on age and sample 50% of the young stratum and 20% of the
older stratum.
You cant just calculate a mean, now, since your sample has proportionally more young people than the
population youre sampling from and any means that you calculate would be heavily skewed. To adjust for
this, you use weighting.
Convenience sampling
You take a sample of whomever you can get. Its a non-probability sampling technique, so its really
hard to generalize it to a larger population.
For example, walking around asking random people on the street to ll out a questionnaire, or going to a
fertility clinic and recruiting the patients.
Convenience samples often suffer from volunteer bias, since people who volunteer for a study are likely
to be systematically different from the broader population.
Quota sampling
A method by which the proportions in the sample in various subgroups (according to criteria such as
age, sex, and social status of the individuals to be selected) are chosen to agree with the corresponding
proportions in the population. The resulting sample may not be representative of characteristics that
have not been taken into account (4).
This is just doing a convenience sample where you continue sampling until you have enough people in all
subgroups or strata youre interested in. Its like a cross between a convenience sample and a stratied
sample.
Differentiate between levels and types of measurement
Types of variables: continuous, discrete, or categorical
Continuous variables are attributes or characteristics that theoretically have innitely ne gradations.
For example, weight. You arent restricted to having weights in units (1 kg, 2 kg, etc.), you can have fractions
(1.3 kg, 0.45928kg). Therefore its continuous.
Discrete variables only exist in distinct units and are expressed in integers or counts.
For example, 1 child, 2 children. You cant have 1.3403 children. Heart rate, in beats per minute, could be
considered discrete, since you cant have half-beats. [A: Although thats technically true, most people think of
heart rate as continuous.]
25
Categorical variables have natural categories.
For example, colours.
Dichotomous variables are categorical variables with two categories.
For example, alive or dead, male or female, stroke or not stroke.
Continuous variables are often turned into ordinal, categorical, or dichotomous variables in order to
make them useful for regressions and other analyses that assume normal distributions. For example, if
age is not normally distributed in your sample, you might dichotomize into old and young based on a
certain cutoff, or into age categories like <20 years, 21-40 years, and >40 years.
Levels of variables: interval, ratio, ordinal, nominal
[A: I dont really know what levels means. Normally Id just call all of these variables.]
The interval and ratio levels assume that each interval or unit on the scale is the same as every other
interval (that is, equal intervals). The ratio level further assumes that a zero on the scale represents an
absence of the phenomenon being measured. All continuous variables are measured at either the integer
or ratio level.
For example, the Celsius scale is an interval scale. The difference between 20 and 21C is the same as the
difference between 30 and 31C, but 0C is an arbitrary designation. Weight is a ratio measure, because the
difference between 20g and 21g is the same as the difference between 30g and 31g, and 0g is an absence of
weight. Because its a ratio scale, you can also say that 4g is twice as heavy as 2g (hence ratio measure).
Compare that to an interval measure like Celsius, where 4C isnt twice as hot as 2C.
The ordinal level represents variables as labels that have an order and can be ranked. Likert scales are
a common example.
For example, rating your interest in this class, where 1 is hate it, 2 is dislike it, 3 is dont care either way, 4
is like it, and 5 is love it. The distance between hating and disliking isnt necessarily the same as the
distance between disliking and not caring, but there is denitely an order to them. A health-related example is
stage or grade of cancer.
[A: Ordinal, as in order. Compare this to nominal, as in names.]
The nominal levels represents variables as labels that dont have an order. Categorical variables are
measured at the nominal level.
For example, colours. Theres no inherent order to red and blue. A health-related example is type of cancer.
[A: Nominal, as in names. Compare this to ordinal, as in order.]
Describe a normal distribution and compare to a skewed distribution
A normal distribution is the classic bell-shaped distribution. It is dened by two parameters, the mean
and the standard deviation. If its right skewed, then it has a long right tail. If its left skewed, then it has a
long left tail.
26
[A: Understanding the following notes makes some stuff easier to understand but is not necessary for the
course.]
There are two types of statistics: parametric and non-parametric. Parametric means that you are assuming
that the data follows a parametric distribution, that is, any distribution that can be dened by one or more
parameters. Non-parametric means that you arent assuming the data follows any distribution at all.
The most common parametric distribution seen in the literature is the normal distribution, whose parameters
are the mean and the standard deviation. Another common example is the chi-squared distribution, whose
single parameter is the number of degrees of freedom.
Non-parametric statistics are less common. They assume nothing about the data, which makes them more
robust but also less likely to reach statistical signicance.
Understand the mean, median, mode, and variance, standard deviation, and
range, and be able to calculate the mean, median, mode, and range
You have a sample of data drawn from a population. You want to make inferences about the population.
One thing you might be interested in is measures of central tendency, which are used to talk about the
typical result. Another thing you might be interested in is the dispersion, or spread, of the data, which
describes how far away the numbers are from the measure of central tendency youve used.
Assuming that the sample is a simple random sample from the population, the following measures are
unbiased approximations of the population.
The mean is your traditional average, where you add all the numbers in your sample and divide by the
sample size. The mean is easily skewed by outliers (data points that are really far away from the other
data points and dont look like they t in). With a dichotomous variable coded as 0 or 1, the mean
corresponds to the proportion of 1s in the sample.
A percentile is the number below which some percentage of the sample lies. For example, the 95
th

percentile is the number in the sample that is higher than 95% of the data.
The median is what you get when you line up all the numbers in your sample and take the middle one.
Another name for the median is he 50
th
percentile. The median is not skewed by outliers. [A: The median
is usually better than the mean for working with more skewed or non-parametric data. Its basically the
non-parametric equivalent of a mean.]
If you have an odd sample size, your median is just the middle number; if you have an even sample size, then
your median is the average of the two most middle numbers.
The mode is the most common number in your sample, which corresponds to the highest peak on a
histogram. The mode is not skewed by outliers.
A bi-modal distribution is one in which there are two modes, or two large peaks. It usually indicates that your
sample has two underlying populations that are mashed together, like if your sample has both men and
women but they respond very differently to treatment.
27
The variance is the mean squared distance from the mean of the data. Standard deviation (SD) is the
square root of variance. Both measure how spread out the data is; the smaller the variance (and hence
standard deviation), the more closely your data are grouped around the mean.

Variance =
2
=
P
n
i=1
(x x)
2
n1

Standard Deviation = =
Variance
Although its technically wrong, you can think of the SD as sort of like the average distance from the mean.
[A: You divide the variance by n-1 instead of by n because youre working with a sample. Theres a very nice
mathematical reason for it, but all you need to know is that it makes sure that its an unbiased estimate of the
population variance.]
The interquartile range (IQR) is the difference between the third and the rst quartile; that is, between
the 75
th
and 25
th
percentiles. You nd the smallest data point that is higher than 75% of the data and
subtract the smallest data point that is higher than 25% of the data. [A: The IQR is basically the non-
parametric equivalent of the standard deviation.]
Identify the most appropriate measurement for central tendency and dispersion
for different levels of measurement including interquartile range and standard
deviation
Situation Measures of central tendency Measures of dispersion
Nominal/categorical data mode
Ordinal data median
mode
IQR
Interval or ratio data mean
median
mode
SD (or variance)
IQR
[A: The above table is has the measures ordered by (my) preference.]
28
Fundamental s of Bi ostati sti cs I I
(6 Feb 2012)
Distinguish between estimation and hypothesis testing
Hypothesis testing tests hypotheses. Your H0, or null hypothesis, is a falsiable statement that is
evaluated using p-values. Null hypotheses are usually statements about no effect or no difference,
which are set up in contrast to an alternate hypothesis that assumes some effect or difference.
Estimation gives estimates. Instead of asking a yes or no question, youre asking for a number that is
often accompanied by a range (the condence interval). [A: Hard stuff, right guys?]
Its the difference between asking is there a difference in mean blood pressure between treated and control
groups? (hypothesis testing) and how big is the difference in average blood pressure between treated and
control groups?. Generally, epidemiologists prefer estimates, for reasons that you will see below.
Interpret p-value and condence interval
The p-value is used in hypothesis testing. You set your null and alternate hypotheses, and you gather
your data. You can then calculate a p-value, which answers the following question: assuming that the
null hypothesis is true, what is the probability of seeing data at least as unlikely as ours? That rst part
is bolded because its important. You are not calculating the probability that your alternate hypothesis is
true; only Bayesian statistics can do that. You are assuming that there is no effect or difference (the null
hypothesis), and asking what percentage of a theoretically innite number of trials would have found
results at least as far out as yours.
Think about that for a minute: this is why our signicance threshold (alpha) of p < 0.05 corresponds to a 5%
false-positive (Type I) rate. The p-value corresponds to the proportion of results at least as extreme as ours
that will be false positives.
The condence interval (CI), in contrast, gives an idea of how precise an estimate is. The theoretical
situation is this: if we could repeat the exact same experiment an innite number of times, 95% of the
calculated 95% condence intervals will include the true population value. The true population value that
were estimating will be contained within 95% of those 95% CIs.
Think about that for a minute: what will affect the condence interval? If we sample a larger proportion of the
population, well be able to say that were closer to the true population value, so larger sample sizes make
tighter condence intervals. But if the data points are all really different, then well be less condent saying that
29
were close to the true population value, so a larger spread (the variance or standard deviation) makes wider
condence intervals.
The null value is the value that corresponds to no effect or difference. For RRs, ORs, and IRRs, the null
is 1. If a 95% CI contains the null value, it is not statistically signicant at a 5% false positive rate (that is,
we know that a corresponding p-value would be greater than 0.05 if we calculated it). If a 95% CI does
not contain the null value, then it is statistically signicant (a corresponding p-value would be less than
0.05).
And think for another minute: what happens if were estimating a difference or effect where the true population
value is the null value? Then 95% of the 95% CIs we calculate will contain the null value and 5% will nota
5% false positive rate!
Condence intervals are better than p-values. They can test all the same hypotheses as p-values (as
weve just seen), but they also give us an idea of how precise the estimates are and how strongly we
should interpret the statistical signicance or non-signicance.
Tight condence intervals suggest that the sample size was appropriate; a negative result with a small
condence interval probably really is a true negative. Wide condence intervals suggest that the sample
size was inappropriate; a negative result could be a true negative but could also be due to the sample
size being too small to detect the difference.
The following table gives an example of how to interpret condence intervals of a relative risk:
Estimate 95% CI Interpretation
2.1 0.8 to 3.1 Not statistically signicant (includes the null value of 1), and a tight condence
interval. We should feel safe saying that theres no effect in this study,
(assuming no bias).
2.1 2.0 to 3.6 Statistically signicant and a tight condence interval. We should feel safe
saying that there is a true effect in this study (assuming no bias).
2.1 1.4 to 37 Statistically signicant and a wide condence interval. We shouldnt feel too
safe saying that there is a true effect in this study (assuming no bias).
4.2 0.95 to 10.6 Not statistically signicant (includes the null value of 1) with a wide condence
interval. Given that the condence interval is wide and mostly on one side,
were more cautious; it may still represent a clinically signicant effect
(assuming no bias).
[A: I added the bias thing in because I think thats important to think about. P-values and CIs only tell us about
the statistical probabilities given a theoretically perfect situation where we have perfect simple random
sampling and no bias.]
We can also do hypothesis testing for statistically signicant differences between two groups by
comparing their condence intervals. If you are comparing two estimates (like blood pressure between
30
two groups), then the difference is statistically signicant at an alpha of 0.05 (5% false positive rate) if the
two 95% condence intervals do not overlap.
For example, if smokers have a mean blood pressure of 180mmHg (95% CI 150-190) and non-smokers have
a mean blood pressure of 120mmHg (95% CI 100-130), the condence intervals do not overlap so a
statistically signicant difference exists at an alpha < 0.05 level.
Because CIs give you more information than p-values, you should always report condence intervals
instead of p-values. Remember to also consider whether the results are clinically signicant, not just
whether they are statistically signicant.
Dene and interpret Type I error (alpha)
Type I error is when you reject a null hypothesis (that is, conclude that there is a difference or effect)
when there really isnt a true difference or effect. The alpha (!), or false positive rate, is the proportion of
false positives that youre willing to accept (assuming that your results are only affected by random
chance).
Type I = false positive
Dene and interpret Type II error (beta)
Type II error is when you accept a null hypothesis (that is, conclude that there is not a difference or
effect) when there really is a true difference or effect. The beta ("), or false negative rate, is the
proportion of false negatives that youre willing to accept (assuming that your results are only affected by
random chance).
Type II = false negative
Dene and interpret power
Power = 1 - ". The most common power youll see is 80%, or " = 0.2. People usually talk about power
in the context of sample size calculations. For some reason, people talk about # and power instead of #
and ".
Identify factors required for sample size and power calculations
The four components are: Type I error, Type II error, effect size, and sample size. Power corresponds
to 1 - ". Power decreases with a more stringent # (lower Type I error), increases with a larger effect size,
and increases with a larger sample size.
You want to detect a difference between two groups, such as control and intervention. The sample size that
you need for an experiment is determined by your desired alpha and power and the minimum effect size you
want to detect. For our purposes, sample size calculations refer to the size in each group.
Distinguish between a negative and an underpowered trial
Corresponding to the last objective, there are a number of explanations for a negative (statistically non-
signicant) result:
# too low;
31
sample size too small;
effect size too small;
or there really is no effect!
Dene and interpret statistical interaction [NiO]
Interaction is when the effect of an intervention or exposure is modied by a third variable. Interaction
should always be considered when interpreting studies.
An example is asbestos exposure and lung cancer. Asbestos causes lung cancer; smoking causes lung
cancer. The risk of lung cancer among smoking asbestos miners is expected to be high and you can
calculate what you would expect the risk to be. But people who are expected to both have an even
higher risk of lung cancer than you if you calculate what it should be. The effect of asbestos on lung
cancer is increased in the presence of smoking. In this case, theyre working synergistically to increase
risk of lung cancer above and beyond what one would naturally expect.
Interpreting multiple/multivariate regression [NiO]
Lets say that youre reading about a multivariate logistic regression that predicts CHF based on
presence of ischemic heart disease and several other variables. Logistic regressions give you odds
ratios. If the result of this logistic regression was that heart disease has an OR of 1.98, then you would
say, The odds of CHF is 1.98 times higher among patients with ischemic heart disease than those
without, assuming that the other variables are held constant (i.e. adjusting for the other variables).
32
Are the Resul ts Val i d? I
(13 Feb 2012)
Compare and contrast observational (cohort, case-control and case series) and
experimental studies
Weve already seen some of this. The main difference between observational and experimental studies is
that, in an experimental study, we assign the exposure status to the participants.
The major problem with causal inference in cohort studies is confoundingthe possibility that an
association between our exposure and outcome only exists only because of some third variable (the
confounder) thats affecting both exposure and outcome. With observational studies, we can adjust for
confounders that we know of and can measure. With RCTs, however, the randomization adjusts for all
confounders, known and unknown, measurable and unmeasurable. Randomization is like magic that
removes confounding.
According to the prof, only RCTs can establish causality. [A: I personally think our prof emphasizes this
point way too much, since its denitely possible to do causal inference without an RCT.]
Dene study population and inclusion criteria
A trial requires inclusion and exclusion criteria for who you will include in the study. More restrictive
criteria increases power but limits generalizability (also called external validity).
A homogeneous population, where everyone is very similar, will give you more statistical power by will
limit the studys external validity (your ability to generalize beyond the study sample). A heterogeneous
population will improve external validity but will add a lot of noise to the data and limit your ability to
detect an effect. You need to nd a balance between homogeneity and heterogeneity of your sample.
Dene randomization and allocation concealment and differentiate between the
two
Randomization is where the participants are randomly assigned to either the intervention/exposure
group or the control group.
Allocation concealment means that you cannot predict who has been randomized to what group. Its to
prevent the researchers from picking the right envelope for their patient. Bobby, I really think you
33
should wait until after this next person goes... This concept is related to blinding, except allocation
concealment is about the randomization process (making sure that the randomization is actually random)
and blinding is about making sure that no one nds out after randomization.
Allocation concealment makes sure that randomization is actually random. It has to do with the people who
are in charge of allocating people to one group or another. All studies need allocation concealment, but not all
need blinding. Take a surgical intervention being compared to medical treatment. You cant blind anyone to
the intervention in that case, since its pretty obvious whos getting surgery. But you still need allocation
concealment to prevent the person whos doing the randomization from knowing who theyre sending where.
If the person whos randomizing can check the next envelope to determine what group the next person will be
randomized to, they may start sending people they think need surgery to the surgery group and people they
think need medical care to the medical care group. The worst example of this that I remember hearing about
was that the residents didnt want to do the experimental surgery without the attending surgeon, so when he
wasnt there (usually at night), they randomized everyone to the control group.
You have to be extra careful with block randomization, since the allocator may be counting and will be able to
predict the next random assignment.
Since this is a bit of a confusing concept, here are some more denitions of allocation concealment:
[Allocation concealment is] a method of generating a sequence that ensures random allocation between two
or more arms of a study without revealing this to study subjects or researchers. The quality of allocation
concealment is enhanced by computer-based random allocation and other procedures to make the process
impervious to allocation bias (6).
Allocation concealment helps to minimize selection bias by shielding the randomization code before and until
the treatments are administered to subjects or patients, whereas blinding helps avoid observer bias by
protecting the randomization code after the treatments have been administered (7).
Randomization and allocation concealment are important because they ensure that, in aggregate, the
two groups are more or less the same at the start of the trial. Any confounders are now distributed at
random, so they cant bias our conclusions about the effect of our intervention or exposure. It accounts
for all known and unknown confounders.
Dene block and stratied randomization
Lets say were assigning people to intervention or exposure using a random number table. If our
allocation is completely random, we may randomize everyone to experiment just by random chance.
Block randomization is where youre forcing equal numbers to be allocated to intervention and control
groups. The way we do this is by breaking our random number table into blocks, and then within each
block were requiring there to be equal numbers of intervention and control assignments, but in a
random order within the block.
For example, check out the following simple random allocation table. Oops, weve randomize almost
everyone to the intervention!
34
Simple Randomization
Intervention
Control
Intervention
Intervention
Intervention
Intervention
Intervention
Control
Intervention
Intervention
Compared this to block randomization, below, where were forcing a 50-50 split. In the table below, our
block size is 4, so that every four participants include two interventions and two controls.
Block Randomization Block
Intervention 1
Intervention
1
Control
1
Control
1
Control 2
Intervention
2
Intervention
2
Control
2
Stratied randomization makes sure that certain subgroups of interest are randomized equally between
the intervention and control groups. We stratify based on our variable (e..g. gender), and then do block
randomization within each stratum. We do this if we dont trust randomization to ensure equal
representative of a very important variable between the two groups.
We should only stratify on factors that have a known and important effect on the outcome.
Dene blinding, and recognize studies where blinding may not be possible
Blinding refers to making sure that the treatment is not known. Single-blinded usually means that the
participants dont know if theyre in the treatment or the control arm. Double-blinded usually means
that neither the participants nor the data collectors know that. If possible, the data analysts should also
be blinded. Ideally, no one in the world knows who is in what arm until after the data has been analyzed,
except for the members of the data and safety monitoring analysts (who will be discussed later, I hope).
Blinding is important. People who know that theyre in the control arm are often more likely to drop out, and
people who know that theyre in the treatment arm are more likely to experience a placebo effect. People who
35
see no effect, or who only experience bad side effects, are more likely to drop out. If people are dropping out
differently because of their treatment/control allocation, thats a bias. If people are reporting results differently
because of their treatment/control allocation, thats a bias. Bias sucks, so we need to keep everyone as
blinded as possible.
People can unblind themselves in any number of ways. If youre comparing Vitamin C to placebo, you have to
somehow mask the taste of the pill, because people know what Vit C tastes like. If youre comparing a drug to
placebo where the drug causes side effects like a headache, people will know what arm theyre in unless you
use an active placebo that also causes those same side effects. Its tricky to blind people to treatment, and it
takes a great deal of care to do properly.
But at the end of the study, how do we know if our blinding worked? The easiest, simplest (and only?) was to
measure the effectiveness of blinding is by asking participants what arm they think they were allocated to. You
then compare their responses to their actual allocation to see if there is statistical evidence that blinding was
not effective. Of course, researchers and companies hate doing this. Its a lot easier to publish if you havent
disproved your own results. Ignorance is bliss. (Dont be like them; be good and try to measure blinding
effectiveness.)
Its pretty intuitive when you cant blind studies. If youre randomizing to surgery or medical treatment,
its pretty hard to blind people.
Dene intention-to-treat analysis, and describe advantages
People drop out of studies, they die, and they dont adhere to their treatment. There are two ways of
dealing with drop-outs, deaths, and non-compliance: per-protocol analysis and intention-to-treat
analysis.
Per-protocol analysis only looks at people who fully completed the treatment or control that they were
allocated to. Intention-to-treat analysis includes all people who were randomized, regardless of how
compliant or alive they were.
Intention-to-treat is generally regarded as less biased, since its better at dealing with loss to follow-up
(death or dropouts). Its also a better measure of the real-life effectiveness of a treatment, since we can
only write a prescription or referral (that is, allocate them to treatment) but we cant make sure that they
actually comply.
If the treatment is so horrible that most patients stop taking it, intention-to-treat analysis will be much less
biased than per-protocol. Consider that some people will get better and some with get worse, regardless of
treatment. Since people who see a completely random improvement are more likely to continue taking their
treatment in the face of horrible side effects, then the only people left in the treatment arm (with side effects)
are those who had random improvements. People in the control arm (with less awful side effects, since it
would be unethical to give truly awful side effects) mostly adhere, including those who randomly improved and
those who did not. A per-protocol analysis will then show that the treatment has an effect, even though it
didnt do anything except cause the non-improvers to drop out! Awful. Always prefer intention-to-treat
analysis, and be suspicious of studies that do not.
And if theres a huge difference between intention-to-treat and per-protocol analyses, then consider if theres a
problem with the treatment, the implementation, the randomization, the allocation concealment, or the
36
blinding. Theres something shy going on there. Everyone who is randomized should be included in the
analysis.
Intention-to-treat is also more reective of the real-world effectiveness of the treatment. We can only allocate
people to treatment or no treatment. You can only right a prescription or a referral, you cant actually force
your patient to follow through. If theres a miracle drug thats so god-awful to take that no patient actually
complies, then whats the point of writing a prescription for it? ITT analysis accounts for this, while per-protocol
would show the amazing results for the 1 out of 100 people that actually followed through on treatment. Thats
almost useless to me as a physician.
ITT analysis sometimes underestimates the effect of a treatment. That is, its a conservative estimate of
the treatments effect.
Understand what the CONSORT RCT reporting guidelines are
[A: Ive added this in because I think that these reporting guidelines are hugely important. CONSORT is
for RCTs, but similar guidelines exist for observational studies and meta-analyses, as well.]
CONSORT is a well-thought-out set of guidelines for the proper reporting of randomized clinical trials. It very
clearly explains everything that you need to report (and therefore, that you need to have thought about). If you
ever want to do a thorough analysis of any RCT, pull out the CONSORT checklist and see which items they
arent reporting.
One of the most basic things that CONSORT says is that Table 1 should be baseline demographics and Figure
1 should be the owchart below. If you ever read an RCT that does not have those two things, throw it away;
its useless.
The lecturer consistently uses CONSORT to refer specically to the CONSORT-recommended owchart. I
dont know why.
37
!"#$"%& $()(*+*,( -./. 0123 45)67)+
!"#$ Schulz kl, AlLman uC, Moher u, for Lhe CCnSC81 Croup. CCnSC81 2010 SLaLemenL: updaLed guldellnes for reporLlng parallel group
randomlsed Lrlals. %&' 2010,340:c332.
For more information, visit www.consort-statement.org.
Assessed for ellglblllLy (n= )
Lxcluded (n= )
noL meeLlng lncluslon crlLerla (n= )
uecllned Lo parLlclpaLe (n= )
CLher reasons (n= !
Analyzed (n= )
Lxcluded from analysls (glve reasons) (n= )
LosL Lo follow-up (glve reasons) (n= )

ulsconLlnued lnLervenLlon (glve reasons) (n= )
AllocaLed Lo lnLervenLlon (n= )
8ecelved allocaLed lnLervenLlon (n= )
uld noL recelve allocaLed lnLervenLlon (glve
reasons) (n= )
LosL Lo follow-up (glve reasons) (n= )

ulsconLlnued lnLervenLlon (glve reasons) (n= )
AllocaLed Lo lnLervenLlon (n= )
8ecelved allocaLed lnLervenLlon (n= )
uld noL recelve allocaLed lnLervenLlon (glve
reasons) (n= )
Analyzed (n= )
Lxcluded from analysls (glve reasons) (n= )
8
1
1
2
9
)
(
5
2
,

8
,
)
1
:
;
5
;

0
2
1
1
2
3
<
=
>

8andomlzed (n= )
?
,
7
2
1
1
+
*
,
(

38
Are the Resul ts Val i d? I I
(27 Feb 2012)
Identify and interpret baseline data in a clinical trial
Due to randomization, the two groups should be no different in baseline characteristics, at least not
statistically signicantly different.
[A: This is one of the few times that Ill say its okay to report p-values instead of condence intervals.]
Identify attrition, and discuss possible effects on results of clinical trial
Attrition refers to the loss of participants during a trial. If people are leaving the trial for reasons related
to their allocation to treatment or no treatment, then it introduces a bias. If the attrition is the same in
both groups (that is, unrelated to allocation), then it doesnt bias the results but does reduce your
sample size and therefore power.
Compare and contrast efcacy and effectiveness, and internal and external
validity
Efcacy is the effect of the treatment under ideal conditions, with the patients chosen very carefully and
lots of attention paid to them by their health-care providers. Effectiveness is the effect of the treatment
under real-world conditions, where the criteria for treatment are a lot looser and health-care providers
are busier.
Internal validity refers to the validity of the results of our study. Low bias means high internal validity.
Good study design means high internal validity.
If were studying med students at UWO, we take a sample of med students from the university and want to
generalize to all med students in the university. If we have good internal validity, we can do that.
External validity refers to the broader generalizability of our results to larger populations or different
situations.
If were studying med students at UWO, we take a sample of med students from the university for our study.
Since we probably shouldnt generalize our results to med students at McGill, our study has low external
validity.
39
You can see that efcacy studies are designed for high internal validity but low external validity, since we
want a really good answer within our population but dont about generalizing from the efcacy study to
other populations or situations. Effectiveness trials, in contrast, are designed for high external validity,
since we want the results to be more generalizable.
Dene bias, and recognize different sources of bias in studies, including
publication bias
For a review of bias more generally, see Fundamentals of Epidemiology I (16 Jan 2012) and
Fundamentals of Epidemiology II (23 Jan 2012).
RCTs are less susceptible to bias than observational studies. Selection bias and confounding are
minimized by randomization and allocation concealment. Measurement bias is minimized by proper
blinding. However, RCTs are not immune to bias, and pharmaceutical companies love to calculate
systematically inated (or biased) estimates of the effect of their drugs. [A: Thank goodness for kind-
hearted, honest epidemiologists. Right, guys?]
One really interesting bias is called publication bias, which refers to the facts that positive results are
more likely than negative results to be published, especially in high-impact journals. If your study shows
a huge effect of the intervention, youre gonna get published. If your study shows no effect, youll have
to ght to get published anywhere and are guaranteed not to get into a high-impact factor journal.
Imagine youre a pharmaceutical company. The only results that you want doctors to see are the ones that
show a benet, and hopefully a large benet. You have an incentive to bury any negative trials and only publish
the positive ones.
[A: PLoS ONE is a great journal trying to singlehandedly ght publication bias. It publishes all research
submitted to it that is well-done and well-reported, regardless of the results.]
A way to reduce publication bias is to use a clinical trails registry where all clinical trials must submit
their protocol before they start and their results upon completion. That way you can nd the results of all
RCTs, even if they never got published.
When youre doing a systematic review of clinical trials, always search the clinical trials registries!
You can assess publication bias using a funnel plot. If you want to know more about those, google it, Im tired.
40
RCTs: What are the Resul ts? I
(5 Mar 2012)
Describe the problem of multiplicity in analysis, and apply this information with
respect to interpretation of subgroup analysis, multiple and secondary outcomes
If you do enough statistical tests, you will eventually nd something that is statistically signicant just by
random chance. The probability of getting a false positive result increases with the number of
subgroups, outcomes, and time points being compared. This is referred to as multiplicity, multiple-
hypothesis testing (if youre using p-values), or multiple comparisons.
If you do 20 tests at a 0.05 level of signicance, you are saying that you expect one of those to be statistically
signicant by chance alone. Every subgroup you test, every outcome you test, increases your chance of
getting such a false positive. If you test ve subgroups for four different outcomes each, you expect at least
one false positive. If you read a study that reports lots of p-values or condence intervals, start to worry.
There are a couple of ways to statistically adjust for this issue, all of which really amount to making the alpha
smaller (i.e. stricter). If you need to do this, ask a statistician.
There was an HIV vaccine trial once where the vaccine didnt work, but the researchers kept doing subgroup
analyses until they found a subgroup that showed a statistically signicant result. They hadnt dened the
subgroups beforehand, and only studied them after the fact when they found them to be statistically
signicant. We call this shing or data dredging, and the conclusions should only be considered to be
hypothesis-generating. It sucks to get a negative result, and it can be tempting to keep looking until you get
one, but be careful.
Differentiate between primary and secondary outcomes, and apply this
information to clinical trials
This is pretty intuitive. Remember our PICO format? The O is for primary outcome, the main thing were
interested in. If you could only measure a single outcome, the one you would choose is the primary
outcome.
You can have any number of secondary outcomes, which are anything other events or outcomes that
you might be interested in. Due to the problem of multiple comparisons (as in the above objective), these
secondary outcomes are usually used more for generating new hypotheses than for making strong
conclusions.
41
In an RCT of some drug on heart failure, you probably want death to be the primary outcome. You may also
wish to collect information on morbidity, though, like days spent in hospital or risk of hospitalization. Think of
digoxin. It doesnt extend life, but thankfully they looked at secondary outcomes and found that it potentially
improved quality of life.
Dene composite outcome
Composite outcomes are what you call it when you combine several outcomes into one variable.
They include people who have had any of the combined outcomes.
For example, a composite outcome for cardiovascular events might include people who have had either
myocardial infarction or stroke.
Composite outcomes make it harder to interpret the results and are sometimes combined when they
shouldnt be.
For example, if a drug leads to a decrease in the composite outcome of death or chest pain, it could mean
that there were fewer deaths and less chest pain but it is also possible that the composite was driven by
decreased chest pain and no change or even an increase in death.
Describe the valid use of composite outcomes
Composite outcomes should be pre-dened (i.e. dened a prior), clinically meaningful, important to
patients, and biologically plausible. You should be careful that one of the components isnt skewing
the results of your composite outcome.
The components of the composite outcome should also be individually dened as secondary outcomes.
Describe rationale for using composite outcomes
Composite outcomes increases statistical power. That is, you dont need as large a sample size to get
statistical signicance. They also allow you to combine several things that you expect to change, so that
you can analyze all of them at once.
Describe potential problems with subgroup analyses
See multiple hypothesis testing, above. Subgroups have fewer participants, so they also have sample
size (i.e. power) issues.
Describe criteria for valid subgroup analyses
1. Subgroups should be dened a priori (i.e. before the study starts).
2. All subgroup analyses that are done should be reported.
3. The subgroup should be biologically different.
4. A difference in effect within the subgroup should be biologically plausible.
5. There should be statistical evidence of this difference in effect.
[A: I would say that these are in order of importance. 1 & 2 are necessary, the rest are nice to have.]
Interpret subgroup analyses in a clinical trial
They should be considered to be hypothesis-generating and should not change your clinical practice.
42
The professor believes that its better to present your subgroup analyses in the guise of interaction analysis. I
agree.
Dene interim analysis
An interim analysis is when you analyze the data before the study has nished. You still do the nal
analysis after completing the study. Interim analyses are usually done by an independent data monitoring
committee in order to determine if the study should be stopped early (see next objective).
Describe reasons for early termination of clinical trials
The criteria for early termination must be set before the trial starts. The main reason to end a study early
is harm, if you discover that the intervention is actually hurting people. Alternately, you may stop
because of benet, if the treatment is helping so much that you cant ethically withhold it from the
control group. The nal reason is to stop because of futility, if youve gathered enough data to say
conclusively that the intervention is not working.
Dene surrogate outcome, and recognize use of surrogate outcomes in a clinical
trial as well as potential drawbacks of the use of surrogates
It can sometimes be hard to measure what you really care about but easy to measure something that we
think is a pathophysiological intermediate on the way to the outcome we care about. A surrogate
outcome is what we measure when we cant measure the outcome we really care about.
Take hypertension treatments: it can be difcult to get funding to follow people until they die (what we care
about), but its easy and quick to measure blood pressure (our surrogate outcome).
Extrapolating from surrogate outcomes to the primary outcome that were really interested in can be
misleading.
Describe study phases in clinical trials
There are four phases:
I. Safety: Screening for safety
II. Efcacy: Establishing the testing protocol
III. Effectiveness: Final testing in a large-scale RCT
IV. Post-approval: Monitoring the drug as its used in the population at large
RCTs are almost always phase III studies. You can identify harmful side effects in all phases. Phases I-
III are required for approval by Health Canada or the US FDA. Phase IV studies can lead to the drug
being pulled from the market, like with Vioxx.
Describe problems of adverse event recognition including the use of the rule of
three
Straight from Wikipedia: If no major adverse events occurred in a group of n people, there can be 95%
condence that the chance of major adverse events is less than one in n / 3. This means that the upper
bound on the 95% condence interval on the adverse event rate is approximately 3 / n.
43
Further from Wikipedia: For example, in a trial of a drug for pain relief in 1500 people, none have a major
adverse event. The rule of three says we should have 95% condence that the rate of adverse events is no
more frequent than 1 in 500.
For example, if 14 people were treated and none of them develop an adverse event, we can be 95% condent
that the true rate of adverse events is 3/14 = 0.214 = 21% or less.
44
RCTs: What are the Resul ts? I I
(19 Mar 2012)
Differentiate between dichotomous and continuous outcomes
[A: Just check Fundamentals of Biostatistics I (30 Jan 2012).]
Dichotomous outcomes are generally easier to analyze and are the basis of the two-by-two tables. For
our examples, well only be considering dichotomous outcomes.
Examples of dichotomous outcomes are death versus no death, stroke versus no stroke, and acute
myocardial infarction versus no acute MI. Examples of continuous outcomes are change in blood pressure,
change in BMI, and change in CD4+ count.
When provided with information from a clinical trial, develop a 2x2 table
[A: Yall love the two-by-two tables, right?] You should be able to create a two-by-two table from the
papers ow diagram and tables. Remember that the two-by-two table will have different numbers if
youre doing per-protocol analysis instead of intention-to-treat.
As a refresher, heres the standard RCT two-by-two table:
Disease + Disease -
Treatment
No treatment
a b
c d
Ill now work through a quick example, to make the distinction between intention-to-treat and per-protocol
analysis clear. Feel free to skip it, I wont be hurt.
45
Below are Figure 1 and Table 3 from the PROactive study, which looked at the effect of pioglitazone on stroke
(and several other cardiovascular outcomes).
As you can see from the ow diagram, they did an intention-to-treat (ITT) analysis. Good! ITT is calculated
based on the participants allocation, and does not consider whether or not they completed the protocol or
Published in: Lancet (2005), vol.366, iss. 9493, pp. 1279-1289
Status: Postprint (Authors version)
All time-to-event analyses were done by Iitting a proportional hazards survival model with treatment as the only
covariate. The proportional hazards assumption was tested with the method described by Grambsch and
Therneau.
12
Homogeneity oI response was examined by testing Ior interaction in each oI 25 prespeciIied sets oI
subgroups. We used linear models or logistic regression models Ior other endpoints, as appropriate. All analyses
were by intention to treat.
This study is registered as an International Standard Randomised Controlled Trial, number ISRCTN
NCT00174993.
figure 1: Trial profile

Role of the funding source
The study was designed by the international steering committee, who also approved the protocol and
amendments. The sponsors had two representatives on the international steering committee and the same two
were also members oI the executive committee. Data analysis, data interpretation, and writing oI the report was
done by the executive committee, with contributions Irom the international steering committee, the data and
saIety monitoring committee, and the endpoint adjudication committee. All the authors had Iull access to all the
data in the study and had Iinal responsibility Ior the decision to submit Ior publication.

Published in: Lancet (2005), vol.366, iss. 9493, pp. 1279-1289
Status: Postprint (Authors version)
Table 3: Numbers of first events contributing to the primary composite and main secondary endpoints
Primary composite endpoint Main secondary endpoint
Pioglitazone
(n2605)
Placebo
(n2633)
Pioglitazone
(n2605)
Placebo
(n2633)
Any endpoint 514 572 301 358
Death 110 122 129 142
Non-Iatal Ml (excluding silent Ml) 85 95 90 116
Silent Ml 20 23 NA NA
Stroke 76 96 82 100
Major leg amputation 9 15 NA NA
Acute coronary syndrome 42 63 NA NA
Coronary revascularisation 101 101 NA NA
Leg revascularisation 71 57 NA NA
MImyocardial inIarction. NAnot applicable. This table describes the events that make up the primary composite endpoint, so iI death is
not the Iirst event, it does not appear.

Table 4: Effect of pioglitazone and placebo on each component of the primary endpoint
First events Total events
Pioglitazone
(n2605)
Placebo
(n2633)
HR (95Cl) Pioglitazone Placebo
Death 177 186 096 (078-118) 177 186
Non-Iatal Ml (including silent Ml) 119 144 083 (065-106) 131 157
Stroke 86 107 081 (061-107) 92 119
Major leg amputation 26 26 101 (058-173) 28 28
Acute coronary syndrome 56 72 078 (055-111) 65 78
Coronary revascularisation 169 193 088 (072-108) 195 240
Leg revascularisation 80 65 125 (090-173) 115 92
Total 803 900
Data reIer to Iirst event oI that particular type. MI myocardial inIarction.

Table 5: Hazard associated with relevant baseline characteristics* for the main secondary endpoint
HR (95Cl) P
Age (year) 105 (104-106) 00001
Previous stroke 171(140-208) 00001
Current smoker (vs never smoker) 170 (134-216) 00001
Past smoker (vs never smoker) 119 (100-142) 00512
Creatinine ~ 130 i-imol/L 167 (120-231) 00022
Previous myocardial inIarction 149 (125-178) 00001
HBA, ~7-5 148 (124-176) 00001
Peripheral obstructive artery disease 135 (110-165) 00036
Diuretic use 133 (113-157) 00007
LDL cholesterol ~4 mmol/L (vs 3 mmol/L) 133 (105-167) 00165
LDL cholesterol 3-4 mmol/L (vs 3 mmol/L) 122 (101-146) 00357
Insulin use 132 (112-155) 00008
Percutaneous coronary intervention or coronary artery bypass graIt 076 (063-093) 00083
Statin use 083 (069-100) 00452
Allocation to pioglitazone 084(072-098) 00309
*Resulting Irom stepwise selection procedure (other variables considered: sex, body-mass index, duration oI diabetes |5 vs 5 to 10 vs
10 years|, use oI metIormin versus sulphonylureas, combined blood pressure |low risk vs high risk|, triglycerides |low risk vs at risk vs high
risk|, HDL cholesterol |low risk vs at risk vs high risk|, micral test results |positive vs negative|, previous acute coronary syndrome,
evidence oI coronary artery disease, photocoagulation therapy, metabolic syndrome |present vs absent|, use oI blockers, use oI angiotensin-
converting enzyme inhibitors).

46
were lost to follow-up. ITT is what you should do unless you have a good reason to do per-protocol, like your
supervisor telling you to do per-protocol. The two-by-two table for an ITT analysis of stroke would look like
this:
Disease + Disease -
Treatment
No treatment
76 2605 76 = 2529
96 2633 96 = 2537
A per-protocol analysis, on the other hand, is limited to those who are not lost to follow-up. The two-by-two
table for a per-protocol analysis of stroke would look like this:
Disease + Disease -
Treatment
No treatment
76 2427 76 = 2351
96 2447 96 = 2351
[A: By coincidence, the no-disease numbers are the same in both groups here. Dont get distracted by that,
its random chance.]
As practice, try calculating the RRs or ORs for both, and seeing how they differ.
When provided with information from a clinical trial, calculate and interpret the
control event rate (CER)
The CER is the event rate in the control (no treatment) group.

CER =
c
c +d
This is obviously the same as the incidence of the outcome in the no treatment group. Depending on the
study design, they may measure it as an incidence rate instead of an incidence, in which case you would have
to do an incidence rate calculation instead. For a review of incidence and incidence rate, see Fundamentals of
Epidemiology I (16 Jan 2012).
We use the CER to estimate the number (or rather, the incidence or incidence rate) of bad events that we
expect to happen in our treatment group if they hadnt got the treatment.
experimental event rate (EER)
The EER is the event rate in the intervention (treatment) group.

EER =
a
a +b
This is obviously the same as the incidence of the outcome in the treatment group. Depending on the study
design, they may measure it as an incidence rate instead of an incidence, in which case you would have to do
an incidence rate calculation instead. For a review of incidence and incidence rate, see Fundamentals of
Epidemiology I (16 Jan 2012).
47
relative risk (RR)
The RR for a trial is the same as a normal RR. Since the CER and EER are usually just the incidences of
events, we get:

RR =
EER
CER
absolute risk reduction (ARR)
The CER is the rate of events without intervention and the EER is the event rate with the intervention.
The absolute risk reduction tells us how many events the intervention is preventing, by taking the
difference of the CER and EER.
ARR = CER EER
If our events are good events instead of bad, you may want to calculate the ARR as EER CER. In the notes,
they specify the ARR more generally, as the absolute value of the difference of CER and EER. [A: I dont agree
with this, since you should be allowed to have a negative ARR if the intervention turns out to be harmful, as
happens from time to time. A positive ARR reects a reduction in risk, and risk is usually bad.]
relative risk reduction (RRR)
The relative risk reduction tells us the proportion or fraction of the events that we would have
expected (given by the CER) that have been prevented with the treatment (the ARR).

RRR =
ARR
CER
=
CER EER
CER
RRR is always higher than ARR, so many pharmaceutical companies choose to report the RRR instead of the
ARR. Both RRR and ARR are important, but dont be fooled by a high RRRalways check the ARR, too.
number needed to treat (NNT)
The number needed to treat tells you how many people you need to treat in order for the treatment to
do something good for one of those people.
If the outcome is death or disease, it tells you how many people you need to treat in order to prevent a single
harmful outcome. Alternately, if the outcome that youre measuring is some sort of improvement, the NNT tells
you how many people you need to treat in order to see a single patient improve.
Remember, the NNT is talking about the effect of a specic drug or intervention; some people will get better or
worse regardless of treatment.
[A: The NNT is probably the most important number to consider for clinical decision-making because it
tells us how useful a treatment is in terms that we can intuitively understand.]
48
The higher the NNT, the less useful the drug or intervention is because it means that we need to treat more
people in order to see any benet. For example, take a trial for the effect of pioglitazone on preventing stroke.
If the NNT is 143, that means you have to give 143 patients with pioglitazone in order to avoid a single stroke.
The ideal NNT is 1, where each patient that we give the drug to is expected to improve because of it. Thats
rare to see.
The NNT is calculated unintuitively (to me) as:

NNT =
1
ARR
1
CER EER
The fancy bars on either side means to round up. [A: Im teaching you math notation, too. Thats a two-
fer.]
The NNT can also be calculated from the odds ratio using a much more complicated formula that you can
look up if you ever need it.
There is also a number needed to harm (NNH), which is the same as NNT except its for the adverse effects
of the treatment.
Using the pioglitazone example above, if it has an NNH for heart failure of 31 (not great), then 1 of every 31
patients you give the drug to are expected to get heart failure that they otherwise would not have gotten it.
Considering that the NNT is 143 for stroke, thats a lot of people getting heart failure for a very small number
who are avoiding stroke.
Like most of the statistics we use, the NNT (and NNH) corresponds to a specic period of time (for
example, death within one year of allocation).
When provided with information from a clinical trial, calculate and interpret odds
ratio (OR)
The odds ratio is, as always, a ratio of odds.

OR =
a d
b c
Most trials are for rare outcomes, so the OR is a good approximation of the RR. [A: Most people treat the
OR as though it was identical to the RR, but you know to be a little more cautious.]
If you need a review of odds ratios, see Fundamentals of Epidemiology I (16 Jan 2012).
Provide information regarding strengths and weaknesses of NNT
The strengths of NNT are the that it is intuitive, simple to interpret, easy to calculate, and takes into
account peoples baseline risk. The weakness is that you cant use it alone to make decisions, since you
always need to consider things like a patients age, adherence to therapy, costs, etc. [A: Thats really a
weakness of all statistics that we use, though.]
49
Critically appraise an article on therapy
asdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfsafdf
50
Case-Control and Cohort
Studi es (26 Mar 2012)
Describe the purpose and structure of case-control and cohort study design
[A: This should be mostly review at this point.]
Case-control studies
In a case-control study, youre nding people who have a disease (cases) and comparing them to
people who dont (controls). Youre grouping by outcome, and youre comparing the groups for
differences in exposure rates.
Controls. Selection of controls is a huge source of bias in case-control studies since it is very easy to
introduce confounders.
Ideally, you can identify a hypothetical cohort from which the cases are drawn from (such as, all people within
the hospitals catchment area), and then controls are sampled at random from this hypothetical cohort. Its
much harder to do in real life.
If cases are people who show up to my hospital with mesothelioma then the hypothetical cohort is people
who would show up to my hospital if they had mesothelioma. You can see how it might be hard to randomly
sample from that cohort. One common way to do it is to use random-digit dialling in the hospitals catchment
area for the disease of interest (which, for a tertiary centre and a rare disease, could be a huge area). Even
then, people who have land lines and are willing to act as controls in a research study are probably different
sorts of people from the cases.
Matching. To reduce confounding, you can try to make each control match one of the cases for
important variables (often age and sex). You do 1:1 matching (one control for each case) or you can
choose to match more than one control to each case in order to increase statistical power. In the end, if
theres an old woman as a case, there would be one or more old women as controls who are matched to
that case. If you match, you need to use more involved statistical analyses and you should not include
the matched variable in any regression that you do.
Theres not much added power after a ratio of 4 controls to each case, which is why you dont often see more
than that.
51
Frequency matching. Instead of matching each control to a specic case, you can also frequency
match, so that the overall demographics of the controls is close to the overall demographics of the
cases. Cases will have just as many old people and as many women as the controls, even if they dont
specically have the same number of old women.
Instead of trying to match controls to cases, you can use multivariate regression to adjust for the potential
confounders after the fact. Matching increases statistical power but can be difcult and can sometimes bias
the results.
Probably the best way to do case-control studies is called a nested case-control study. Imagine youre
following a cohort of people along. Each time someone gets the outcome of interest, you take a random
selection of four people in the cohort who dont have the outcome at that point in time (these are your
controls). Later on, its possible that one of those controls will eventually become a case (and will have four
other controls selected for it who are outcome-free at that point in time). This sort of time-dependent sampling
(known as incidence density sampling) means that the odds ratio you get is an approximation of the incidence
rate ratio instead of the relative risk.
One of the reasons that the nested case-control design results in high-quality case-control studies is because
its really only possible to do it if you have a well-dened cohort from which you can get both your cases and
controls. Having a well-dened cohort will make any case-control study better, regardless of the method used
to sample controls.
Cohort studies
In a cohort study, youre nding people who have an exposure and comparing them to people who
dont. Youre grouping based on exposure, and comparing the groups for differences in outcome rates.
Cohort studies can be prospective, where youre picking a group of people and following them up every
month or year, or they can be retrospective, where you construct a group of people that you follow up
through their past records. [A: Know these denitions for the quizzes, but dont obsess about them in real
life.]
You could also take a retrospective cohort and contact all of them to continue following them over time. This is
referred to as an ambidirectional cohort. Fun, right?
Describe the strengths and weaknesses of cohort and case-control studies
[A: Theres a good summary table at the end of the Champion notes for Case-Control and Cohort
Studies.]
Case-control studies
The good:
Fast, easy, and cheap
You dont need as large a sample (i.e. efcient)
Good for rare diseases, since a cohort may only pick up a few new cases each year
Good for diseases with long latency periods, for the same reason
You can study the effects of a number of exposures

52
There is no loss to follow-up

The bad:
Recall bias is a type of measurement bias thats really only present in case-control studies
Recruitment of controls often results in selection bias
Cannot infer causation
Bad for rare exposures
Can only study one outcome per study
Limited to odds ratios
Confounding. See the description below under Cohort studies.

Recall bias is a problem when cases are more likely than controls to report having had an exposure,
either because theyre thinking harder and remembering better, or because theyve created an
association in their head that theyre trying to validate, or because the researcher is grilling them harder
on the exposure. Recall bias is not the same as poor recall, where people just cant remember their past
exposure. Recall bias is differential recall or reporting between the two cases and controls.
Imagine youre studying the effect of aspartame on birth defects. Your cases are mothers who gave birth to
children with congenital malformations and your controls are women at the same hospital who gave birth to
normal children. You interview the new mothers to assess their exposure to diet sodas. The mothers of
children with the malformations have heard bad things about aspartame and are probably going to report diet
soda consumption even if they only drank a single can fteen years ago. The women of normal infants will be
less likely to report such an exposure.
One study thats a decent learning example is Recall bias in the assessment of exposure to mobile phones
by Vrijheid et al. (8). They checked reported cell phone usage against actually cell phone records and found
both poor recall and recall bias.
You can reduce recall bias by making sure that the exposure assessment is identical between cases and
controls, including making sure that interviews are standardized and interviewers are blinded to case/
control status. You can also hide the exposure question within the questionnaire (hiding the aspartame
question between questions about smoking status and what colour car they drive, for example).
[A: Im not actually suggesting you ask participants about the colour of the car they drive. That was a joke.]
Cohort studies
The good:
The next best thing to an RCT when its impossible or unethical to randomize participants
The timing of exposure and outcome is better established
You can study as many outcomes and exposures as you want, which is why theyre such a rich
source of data
You can calculate odds ratios, relative risks, and incidence rate ratios
You can use them to study rare exposures by selecting an appropriate cohort
Theres less recall and selection bias than case-control studies

53
The bad:
Large and expensive
It can take a long time to get enough data
Theres a risk of loss to follow-up as people disappear or stop responding
Retrospective cohorts are limited to the data that you can nd in the records
Confounding! Confounding, confounding, confounding. For example, a given vitamin might only
protect against heart disease because health nuts tend to take vitamins, not because vitamins do
anything. Coffee appears to cause heart disease because smokers are also more likely to be coffee-
drinkers. And so on. For ever and ever. All observational studies have confounding, though careful
study design and analysis can minimize it.
Recognize and describe types of bias that may occur
Bias is a systematic error and reduces a studys internal validity. Broadly speaking, bias can be
divided into selection bias and measurement bias (also called information bias).
Selection bias
Selection bias occurs when the way you choose your groups (cases and controls in a case-control study
or exposure groups in a cohort study) introduces a systematic error.
Incidence-prevalence bias. Remember how diseases that kill quickly will have very low prevalence
even if their incidence is high? In a study, this means that a study of prevalent cases will miss all those
incident cases where the person died before the study was done. For example, if you recruit people who
are hospitalized for acute MI, youll miss everyone who died before they even got to the hospital.
Detection bias. The exposure of interest makes it more likely that disease will be detected, even though
it may not effect the actual risk of disease. For example, if HRT causes endometrial bleeding and such
bleeding is an indication for testing for endometrial cancer, then women on HRT are more likely to be
tested for cancer, and a spurious relationship between HRT and cancer will be found.
Non-respondent bias. People who respond to surveys are different from those who dont. For example,
smokers are less likely to return questionnaires that include questions on smoking, so your sample will
be biased towards non-smokers.
[A: Membership bias sounds way too much like confounding to me. Ive never heard of it and it isnt in
(6), so Im going to ignore it.]
Measurement (information) bias
Measurement bias occurs when theres a systematic difference between the groups in the way that
outcomes, exposures, or confounders are measured. Outcomes, exposures, and confounders should be
collected in the exact same way for both cases and controls (in case-control studies) and for both
exposed and non-exposed groups (in cohort studies).
Subject bias. The study subjects (or participants) in one group may be more likely to report symptoms
or falsely report compliance than the other group. [A: This is a pretty general term that seems to include
most of the other biases.]
54
Recall bias. This was discussed in the case-control section, above.
Hawthorne effect. People who know that theyre being studied often report more positive results for no
apparent reason. Its kind of like a placebo effect. [A: Check Wikipedia for this one, its cool.]
Detection bias. Data collectors may look more carefully for an outcome or exposure in one of the
groups than the other. This is protected against by strict training, adherence to interview or data
collection protocols, and, ideally, blinding of the data collectors.
Recognize and describe confounding
A confounder is a variable that differs between the comparison groups and is associated with the
outcome. My favourite example is, as always, owning a lighter causing lung cancer; the apparent
correlation is confounded by smoking status. Confounding arises because a persons exposure status is
associated with a whole bunch of things. Coffee drinking is associated with a bunch of things, including
persons smoking status, their ethnicity, their age, their occupation, and many other things I cant
possibly think of. This is why I say that observational studies always have confounding (although its not
always a problem).
Randomizing breaks the connection between the exposure and the confounders. If you randomize
people to owning a lighter, suddenly its no longer associated with their smoking status. Smokers are
randomly and (hopefully) equally distributed between the two exposure groups, and the apparent
relationship between lighter ownership and lung cancer disappears (since it was really the smoking
causing the lung cancer). This is why RCTs are so great; they remove the effect of confounders, even
those we cant measure.
You can try to remove confounding from observational studies by matching, stratication, or
multivariate regression. When we used stratication or multivariate regression to remove confounders,
we call our result an adjusted odds ratio or relative risk. However, these techniques arent perfect. You
have to start worrying about measurement bias of the confounders, not just the measurement of the
exposure and outcome. More importantly, you can only adjust for (or match on, or stratify by) things that
you can and do measure. Confounding that we dont measure is called residual confounding.
Confounding can be a problem in RCTs, since randomization is random. Just by chance, more smokers may
be assigned to lighter ownership, creating a spurious relationship. Larger sample sizes make this less likely.
Stratied randomization can also ensure that both groups have the same major baseline characteristics.
Matching. Matching is usually only done for major confounders like age and sex. In case-control,
controls are selected that match cases for suspected confounders. In cohort studies, non-exposed are
matched to exposed participants for suspected confounders. [A: I dont think Ive ever seen a matched
cohort study.]
Stratication. The data is divided into strata based on a confounder, and the measure of effect (usually
OR or RR) is calculated within each strata. The strata-specic results are then merged, if possible, to
give an adjusted odds ratio or risk ratio that is free of any confounding by the variable that was stratied
on.
55
For example, the lighters and lung cancer data shows an unadjusted odds ratio of 9. If we look at just the
smokers and calculate an odds ratio, though, the OR is 1. We dont see any effect of lighters on lung cancer
within the stratum, since the lighter owners and non-owners no longer differ by smoking status. If we look at
just the non-smokers, the OR is also 1, for the same reason. Merging them back together gives an adjusted
odds ratio of 1.
Multivariate regression. This is by far the most common thing youll see. It uses fancy statistics, which
youll see again in The Interpretation of Statistical Results (23 Apr 2012). When people say that they
adjusted for confounders, this is usually what they mean.
Dene and calculate relative risk and odds ratio
See Fundamentals of Epidemiology I (16 Jan 2012).
Critically appraise a case-control study
Consider sampling, measurement, and confounding. [A: Always mention recall bias if they assessed
exposure with a questionnaire.]
Critically appraise a cohort study
Consider the comparison being made, whether the comparison makes sense, whether theres any
selection bias, and whether theres any confounding.
56
Prognosi s
(2 Apr 2012)
Differentiate between risk and prognostic factors
Risk factors predicts who gets the disease. Prognostic factors predict what happens to them after
they get it. Theres usually a lot of overlap between the two, with things like age and sex being both
major risk factors and major prognostic factors for lots of diseases.
Risk factor Prognostic factor
Patients
Outcomes
Rates
Factors
Start healthy Start with disease
Onset of disease Death, disability, etc.
Rare outcomes Common outcomes
May overlap May overlap
Describe the elements of prognostic studies
Prognostic studies use a prospective cohort where the cohort is dened by the presence of disease. It
uses a (hopefully random) sample of people with the disease who join the cohort at a dened inception
time and are then followed up over time for the outcome(s) of interest. The zero time denes when they
join the cohort, such as time at diagnosis, when symptoms rst appear, or when treatment is started.
Diseases have a natural history: the disease begins as a subclinical biologic disease, it becomes detectable
though its still subclinical, symptoms start to appear, the patient sees a physician, the disease is diagnosed,
and the disease is treated. When studying prognosis, we have to decide when a case actually starts. Studies
can dene their inception cohort as starting at any stage along the diseases history, like a patient joins the
cohort when they rst feel symptoms or a patient joins the cohort when theyre rst diagnosed with the
disease. Changing the zero time changes the prognosis, even for the same course of disease.
The cohort should constitute an unbiased sample of all people at the given stage of disease and the study
should collect data on baseline characteristics. The cohort must be followed up for long enough for clinically
important outcomes to occur.
The results of prognostic studies can be reported as 5- or 10-year survival, case-fatality rate, response
to treatment, remission, or disease-specic mortality.
57
5-year survival rate: the percentage of cases that survive for at least 5 years after a diagnosis or a
treatment.
You can also do 10- and 20-year survival rates. For example, if the 10-year survival of a ductal carcinoma in
situ is 98%, then 98% of people who are diagnosed with it will still be alive after 10 years. Remember that
lead-time bias will mean that the 5-year survival of cases detected by screening is usually longer than those
detected otherwise.
Case-fatality rate: the percentage of people with a disease who die from that disease within a given
period of time.
This is usually used more for acute diseases and outbreak investigations. If 100 people are diagnosed with
lung cancer and 15 people die from it within 10 years, then the case-fatality would be 15%.
Disease-specic mortality rate: the proportion of people in the population dying from the disease,
often given in deaths per 10,000 people.
This mortality rate is different from case-fatality rate, which considers only the people who already have the
disease. For example, start with a population of 100,000 people, 200 of whom get the u and 100 of those
with the u die from it. The case-fatality would be 100/200 = 50%, whereas the disease-specic mortality
would be 100/100,000 = 10 u deaths per 10,000 people.
Response rate: the percentage of cases showing improvement following an intervention.
If 100 diabetics are given insulin therapy and 90 improve, then the response rate is 90%. Its possible that 89
of them would have improved anyway, so this doesnt account for that.
Remission rate: the percentage of cases whose disease becomes undetectable.
This is similar to the response rate, but the outcome is remission instead. Bear in mind that a person can go
into remission but later relapse.
Interpret a survival curve
If youre dealing with time-to-event data then you want to do survival analysis.
Time-to-event data in prognostic studies comes from the inception cohort. People enter the cohort at the time
that they meet the zero-time criterion, then they stay in the cohort until the event (like death) is reached. If
youre studying death after lung cancer diagnosis, then a person will enter the cohort when theyre
diagnosed, live a few years or decades, and die. Survival analysis deals with this sort of data very well.
Note that you can do your usual cohort analysis with relative risks but its limited to a single point in time, like
the RR for death at ve years since diagnosis. Survival analysis, on the other hand, takes time into account,
which is one reason that people prefer it.
Survival is usually displayed in a survival curve, which plots survival against time. The median survival
is the time on the x axis at which the curve crosses 50% survival on the y axis (see the image below).
Censoring (which will be discussed later in this lecture) is displayed using a tick mark on the curve to
indicate the point at which a person left the study.
58
The technical denition of survival is a little bit tricky. Imagine that the study follows people up for several
months after they enter the cohort. Break the time up into months. Survival in any given month is the
percentage of people who started the month alive and who end the month alive. Instead of months, break it
into weeks, or days, or hours, or minutes. The survival function is what you get when the chunks of time
become innitely small, and thats what our survival curves are trying to approximate. One of the advantages
of this odd denition is that it deals with censoring quite nicely (see below for a description of censoring). If
someone leaves the study after 6 months, you can still include them in the survival analysis for those months,
then remove them from the denominator afterward.
Because the sample size decreases over time, as people die or leave the study for other reasons (that is,
theyre censored), the precision of the estimate decreases over time. The estimates of survival at 1 year
will have a tighter condence interval than the estimates at 5 years, just because youre dealing with
smaller numbers.
Censoring is a fancy way of saying that someone stopped being followed up before they got the
outcome, either because they dropped out or because the study ended. You know they survived for at
least as long they were in the study, but you dont know what happened to them after they left it.
For example, consider a study of death after lung cancer diagnosis. Participants enter the study when they get
diagnosed with lung cancer, and you follow them until they die. If someone is diagnosed, is followed for six
months, then moves to another country, then you know that they survived for at least the rst six months but
you dont know what happened to them afterward. Similarly, someone who enters the study two months
before the study ends will only contribute two months before theyre censored (assuming that they dont die
during those two months). Those are both examples of censoring.
Survival analysis reports its results in hazard ratios (HRs). To quote a brilliant young epidemiologist who
isnt me, For all practical purposes, hazards can be thought of as incidence rates and thus the HR can
be roughly interpreted as the incidence rate ratio (9).
The hazard function gives the probability of dying at a given point in time, assuming that you had survived until
that point in time.
One way of interpreting the hazard ratio is as the odds that an individual with the higher hazard reaches
the endpoint rst. For example, a HR of 2 means that theres a 67% (2/3) chance of the treated patient
dying rst.
59
Cox proportional hazards model is the most common method for estimating the hazard ratio. Its a
form of regression analysis, so you can use it to adjust for confounders.
Recognize potential sources of bias in cohort studies of prognosis
The only bias that was discussed specically for prognostic studies is from false cohorts, which are
when you dont use an inception cohort. If you just go out to gather a cohort of people with the disease,
you can only include people who are still alive and have the disease. Your cohort wont include anyone
who died before you started the study, so your results will be biased toward longer-living patients.
60
Di agnosi s
(9 Apr 2012)
Discuss the use of diagnostic tests clinically
To diagnose a symptomatic patient
To screen for disease in an asymptomatic patient
To provide prognosis in a diagnosed patient
To monitor therapy
Diagnostic tests should be:
Reliable and precise
Feasible and acceptable
High intra- and inter-rater reliability

A good diagnostic test is reliable, which means that it gives the same answer for the same patient regardless
of the evaluator (that is, its reproducible). Measures of inter-rater reliability include Cohens kappa, which is
discussed later in this lecture.
Feasibility and acceptability include the patients perspective and cost.
For a test to benet patients
The test must change the diagnosis
There must be an effective treatment for the potential diagnosis
The result of the test and therapy must improve patient outcomes
Describe the characteristics and denitions of normal and abnormal test results
[This doesnt seem to get much discussion in the notes or the slides.]
Normal test results are negative results (no disease, since thats the normal result). Abnormal test results
mean that the test is positive.
61
Develop a 2x2 diagnostic test result table when provided with data from a study
of a diagnostic test
To evaluate a test, you compare it to a gold standard test which is assumed to be perfectly sensitive
and specic. Each person in the diagnostic study should be evaluated by the gold standard and the new
test (ideally with blinding to the results of the gold standard).
In most people, the results of the new test will be the same as those of the gold standard. In some people,
though, the test will say they have disease when the gold standard says they dont (false positives). Other
times, the test will say they dont have disease when the gold standard says they do (false negatives). These
pairs of results are used to ll a 2x2 table from which you can calculate all of the statistics that were interested
in.
Gold Standard Gold Standard
Positive Negative
New test
Positive
New test
Negative
a
(true positives)
b
(false positives)
c
(false negatives)
d
(true negatives)
Dene and calculate sensitivity and specicity
Sensitivity gives the probability of testing positive if you really have the disease. It does not tell you the
probability that you have the disease if you test positivethats the positive predictive value, below.
sensitivity =
a
a +c
=
TP
TP +FN
=
detected true cases
all true cases
Specicity gives the probability of testing negative if you really dont have the disease. It does not tell
you the probability that youre disease-free if you test negativethats the negative predictive value,
below.
specicity =
d
b +d
=
TN
FP +TN
=
detected true non-cases
all true non-cases
A great mnemonic is SpPIn and SnNOut (spin and snout): a highly specic test, when positive, rules
disease in; a highly sensitive test, when negative, rules disease out.
Dene and calculate positive and negative predictive value
Positive predictive value (PPV) gives the probability that you really have the disease if you test positive.
PPV =
a
a +b
=
TP
TP +FP
Negative predictive value (NPV) gives the probability that you really dont have the disease if you test
negative.
62
NPV =
d
c +d
=
TN
TN +FN
PPV and NPV are great. Theyre intuitive and understandable, and are exactly what we want when we think
about the results of a diagnostic test. Unfortunately, they depend on the prevalence of disease in the
population being tested, as well discuss next.
Dene and calculate prevalence
Prevalence is the proportion of people with disease in the population being studied, based on the results
of the gold standard test. Pretty straightforward.
Prevalence =
a + c
b + d
Sensitivity and specicity dont change with the prevalence. If you add more people with disease to your 2x2
table, youre basically adding more a + c. These extra people will go into the a and c cells based on how
good the test is, which ends up leaving the sensitivity unchanged. But you can see that, if youre adding
people into cells a and c, then your PPV will go up (toward 1) and your NPV will go down (toward 0). The
opposite happens if you add more people who dont have disease. In that case, youre adding more d + b.
Set up your own 2x2 table and play around with it to convince yourself.
Apply the role of pretest probability or prevalence in interpretation of diagnostic
test results
The sensitivity and specicity are inherent properties of the diagnostic test. The PPV and NPV, however,
depend on the prevalence of disease in the population being studied. If you increase the prevalence, the
PPV goes up and the NPV goes down. If you decrease the prevalence, the PPV goes down and the NPV
goes up.
The pre-test probability is the probability that a person has the disease before you take into account
the results of a test. Its how likely you think it is that they have the disease, based on things like signs,
symptoms, and clinical judgment. The pre-test probability can also be thought of as the prevalence of
the disease in a patient him or her.
Saying, A patient with this symptom has a 20% chance of having the disease, is basically the same thing as
saying, Of all patients with this symptom, 20% have the disease.
Just like the PPV and NPV depend on the prevalence of disease, they can be thought of as depending on the
pre-test probability. Having a high pre-test probability is like doing the test in a high-prevalence population. If
you think someones really likely to have the disease, then a positive test is very convincing (higher PPV) while
a negative test is more likely to be a false negative (lower NPV). The inverse is true if you think someone is not
likely to have the disease.
If the test was evaluated in a real-world setting, with patients similar to those that you would expect to send
for testing, then the PPV and NPV from the evaluation study may be useful to you when making decisions. If
the population in which it was evaluated has a much higher prevalence (a high-risk population), then the PPV
will be overestimated compared to your population and NPV will be underestimated.
63
Theres a very nice example in the Champion notes, under The problem of prevalence.
Interpret likelihood ratios
Likelihood ratios (LRs) are ways of updating our pre-test probability based on the results of the test.
Theyre calculated from the sensitivity and specicity, so they dont depend on the prevalence.
PPV and NPV are intuitive. They tell us how likely someone is to have disease based on the outcome of the
test, but they change based on the prevalence. LRs provide a way of quickly determining the PPV and NPV
(or post-test probability) for any prevalence (or pre-test probability).
LR can be thought of as the odds of a disease given a positive (LR+) or
negative (LR) test result divided by the odds of disease in the population
(or pre-test odds). For example, with an test that has an LR+ of 20, a
positive test result means that your odds of having the disease are 20
times higher than when you were untested.
An LR of 1 is useless. An LR of 10 means that the test is good at ruling in
disease, and an LR of 0.1 means that its good at ruling out disease.
You can very easily use LRs by taking advantage of a nomogram, shown
on the right. Draw a straight line from your pre-test probability on the left
through the LR in the middle (LR+ if the test was positive or LR if
negative) and keep going until you hit the post-test probability on the
right.
Interpret kappa
Cohens kappa is a measure of agreement between independent raters/observers/testers. It measures
the agreement that cannot be explained be random chance. It varies from 0 (no agreement except by
chance) to 1 (perfect agreement).
I like Champions example chance agreement between two radiologists reading lms. One rates all the lms
and nds 20% are abnormal. The other falls asleep on the normal button and rates all lms as normal.
Technically, the agreement is 80%.
64
Interpret a receiver operating characteristic (ROC) curve
Many tests are not dichotomous but give a range of values. In
order to calculate sensitivity and specicity, though, we need a
dichotomous result: positive or negative. To make it, we just take
our range of values and say that every above a certain threshold
value is positive and everyone under it is negative. Different
thresholds will have different specicities and sensitivities.
For example, lets consider using serum creatinine as a test for
renal failure. If the threshold is 0, our test will say that everyone
has renal failure; itll catch all the actual cases of renal failure
(sensitivity=1) but doesnt rule out renal failure when it isnt there
(specicity=0). On the other hand, if the threshold is innity, our
test will say that no one has renal failure; itll correctly rule out
renal failure when it isnt present (specicity=1) but will fail to
catch any actual cases (sensitivity=0). All other thresholds will give values somewhere between those two
extremes.
If we calculate a sensitivity and specicity for each possible threshold, we can plot them. The resulting
plot is call an ROC curve, shown on the right. You can see that VA (whatever that is) is a generally better
test than NE.
If we take an ROC curve and calculate the area under the curve (AUC), it can tell us how good the test
is in general. The straight black line in the ROC above shows a useless test. It tells us nothing, and has
an AUC of 0.5. The ideal test, with sensitivity and specicity of 100% across all threshold values, would
have an AUC of 1. Normal tests, like those shown, fall somewhere between 0.5 and 1.
Critically appraise a study on a diagnostic test
Was the gold standard appropriate?
Does the test include the gold standard as a part of it? This is bad.
Was there an appropriate spectrum of patients that are similar to those you would want to test in your
clinic or hospital?
Was there verication bias? Is every test result compared to the gold standard, or only certain results?
Is there good intra- and inter-rater reliability?
What are the sensitivity, specicity, PPV, and NPV?
Are condence intervals provided? Are the estimates precise?
How does the test compare to others?
Is is available, affordable, and accessible?
Will the results of the test change your management?
65
Screeni ng
(16 Apr 2012)
Dene and differentiate between the three levels of prevention (primary,
secondary, and tertiary)
Primary prevention prevents disease from ever occurring. Examples include health promotion, exercise,
smoking cessation programs, and immunization.
Secondary prevention tries to catch disease while its still latent or subclinical and treat it before the
disease becomes an illness. Examples include screening.
Tertiary prevention tries to reduce the impact of symptomatic disease. Examples include rehab
programs.
Differentiate between screening and case-nding
Screening is testing large numbers of asymptomatic people for disease. Case-nding is testing a
small number of people (or even one) where theres a high suspicion of disease, such as presence of
symptoms or recent contact with an infected person.
Differentiate between diagnostic and screening tests
Diagnostic testing is done in patients with suspected disease and positive results tell you that the
disease is probably present. Screening is done in patients without suspected disease and positive
results tell you that the disease may be present. Generally speaking, screening tests err on the side of
being sensitive rather than specic, so that they pick up a lot of true cases even if they also pick up a lot
of false positives.
Describe criteria for a screening program
Screening programs should only be implemented if people benet from early detection of the disease.
The following should be true:
The disease is serious with a clear natural history, has a known prevalence, and has an effective
therapy if the disease is found early.
The screening test should be safe, should be cost-effective, and should have a known sensitivity,
specicity and ROC.
66
The healthcare system should clearly dene the screened population. The identied cases should be
followed up and offered an available and accessible treatment that is acceptable to the individual.
When provided with information about a screening test calculate sensitivity,
specicity, positive predictive value (PPV), negative predictive value (NPV) and
prevalence
[A: This is identical to a diagnostic test. Literally.]
Describe the impact of prevalence of disease on the results of diagnostic or
screening tests
[A: Same as diagnostic tests.]
Remember, screening is done in asymptomatic people. This means that theres a low prevalence, and that
decreases the tests PPV. If the PPV is really low, then very few of the positives will be true positives and youll
be wasting everyones time and money. Thats why people only screen in higher-prevalence populations, like
women over 55 for breast cancer. Screening a well-dened subpopulation can make a screening test more
useful.
Apply the impact of prevalence of disease to clinical situations
[A: Same as diagnostic tests.]
Again, screening is done in asymptomatic people and has a lower PPV than if that same test is used
diagnostically.
Dene and recognize lead-time, length-time and compliance bias
All three biases make tests seem benecial even if they dont make a difference.
Lead-time bias. A screening test may add years lived with (diagnosed) disease without adding actual
years of life, just because youve picked it up early. Screening will therefore appear to lengthen the lives
of people with disease, but really its just that youre picking it up earlier.
Length-time bias. People with slowly progressing disease will spend more time in the pre-clinical
disease stage, and are therefore more likely to be picked up by a screening test. Put another way, people
whose disease progresses quickly and kills them are less likely to be detected by screening tests than
the slow-progressing cases. Your screened population will have more of these slow-progressing cases,
so the years lived with (diagnosed) disease will, again, be greater; screening will appear to be
benecial.
67
Compliance bias. People who participate in screening programs make better patients and do better on
treatment. This is a form of selection bias.
Discuss possible adverse effects of screening programs
Labelling or stigma
Undue stress and anxiety from false positive results
False sense of safety from false negative results
Complications from the test, including discomfort, radiation exposure, chemical exposure
Complications from the follow-up tests, which may be more invasive
Overdiagnosis of disease that would never have become clinically signicant

There are things you die of, and things you die with.
68
The I nterpretati on of Stati sti cal
Resul ts (23 Apr 2012)
[A: This is probably one of the more useful topics in the course, since almost every interesting study uses
some form of regression.]
Describe the difference between unadjusted and adjusted results
Unadjusted results are the simple, crude estimates for OR, RR, and IRR that we learned to calculate
earlier in the course. They dont take into account any extra information besides the outcome and the
exposure of interest.
Thats usually ne for RCTs, but when we start getting data that has confounding, like data from
observational studies, then our unadjusted estimates are misleading because they dont account for the
confounders. Adjusted results are those that adjust for measured confounders using statistical
techniques like multivariate regression, which well discuss below.
Lets bring this back to my favourite confounding example: the hypothetical analysis of the effect of lighters on
risk of lung cancer. Our unit of analysis is an individual; our exposure of interest is whether they own a lighter;
our outcome is whether they get lung cancer; our main confounder that we measure is whether they smoke.
An unadjusted result would suggest that lighters cause cancer, since we get a strong relationship and a high
odds ratio. Oops! Those results are horribly confounded. So lets rerun the analysis, using a multivariate logistic
regression (explained below), adding in smoking status to the model. Suddenly the estimated odds ratio for
the effect of lighters on lung cancer plummets to 1, no effect. Thats our odds ratio that has been adjusted for
smoking status.
Interpret statistical ndings and the level of measurement of the outcome
variable in linear, logistic, and survival analyses
These are statistical models to calculate the effect of independent variables (or predictors) on a
dependent variable (or outcome). The results give you the effect or contribution of the independent
variables to the value of the dependent variable. After running the regression, you get a list of
coefcients for each of the independent (or predictor) variables. These coefcients tell us the effect of
the independent variables on the dependent variable.
69
Your choice of model depends on the outcome variable. Linear regression is used for predicting
continuous outcome variables; logistic regression is used for predicting dichotomous or binary
outcome variables; and survival analysis is used for predicting time-to-event outcome variables.
The results of a simple linear regression tell you the direct effects of the independent variables on the
outcome. If our outcome is blood pressure (in mmHg) and the coefcient of daily salt intake (in grams) is
0.5, that tells us that each gram of daily salt contributes 0.5 mmHg to the blood pressure. The model
says that increasing your daily salt intake by 10 grams a day will raise your blood pressure by 5 mmHg.
Its a simple model, and assumes that this linear relationship exists everywhere. But really, will eating 20kg of
salt each day raise your blood pressure by 100 mmHg? The assumption probably isnt really true, but in order
for the results to be useful, the assumption just has to be more-or-less true over a normal range of values.
The results of a logistic regression tell you the odds ratios for the effects of the independent variables
on the outcome. Theyre interpreted like normal odds ratios.
Technically, the coefcient tells you the effect of the independent log odds, as I describe below. But everyone
reports the odds ratio because its very simple to calculate from the coefcient.
The results of a survival analysis tell you the hazard ratios or incidence rate ratios for the effects of the
independent variables on the outcome. Theyre interpreted like normal hazard ratios or incidence rate
ratios. Survival analysis is done using Kaplan-Meier estimates for bivariate models and Cox
proportional hazard models for multivariate models.
Generalized linear models (GLMs) are regression models that follow the form f(Y) = 0 + 1 x X1 + 2 x X2 ...
where the type of regression youre doing is dened by the link function f and the assumed statistical
distribution of Y. The Xs in the equation are the independent variables, and the Y is the dependent variable
(that depends on the values of the independent variables). The s are the coefcients of the independent
variables.
Simple linear regression uses the identity function (f(Y) = Y) and assumes that Y follows a normal distribution
(which means Y is treated as a continuous variable), so it reduces to Y = (Y) = 0 + 1 x X1 + 2 x X2 ... This
is the sort of regression that were taught in high school and undergrad. When you use a single independent
variable, the equation becomes Y = 0 + 1 x X1, which is just Y = M X + B. Hopefully thats a familiar
equation, since its the general for a simple line graph. Linear regression is basically drawing a straight line in
order to minimize the distance from the line to the points.
Logistic regression uses a logit function (f(Y) = logit(Y) = ln (Y / (1-Y))) and assumes that Y follows a binomial
distribution (which means Y is treated as a dichotomous variable), so it reduces to ln (Y / (1-Y)) = 0 + 1 x X1
+ 2 x X2 .... If the outcome is a risk (probability), then Y / (1-Y) is p / (1-p), which is the odds. This means that
logistic regression predicts the log odds. Our regression coefcients will tell us the direct, linear contribution of
the independent variables to the log odds. If we take the exponential function of the coefcient, we get the
odds ratio. Its possible to calculate a relative risk from the coefcient, but its a lot harder. So now you know
why most reported research results give the odds ratio.
Survival analysis is more complicated and I dont want to talk about it.
70
Heres a summary table:
Model Outcome Results
Linear Continuous Direct interpretation
Logistic Dichotomous/binary Odds ratio
Survival analysis Time-to-event Hazard ratio
What I mean by direct interpretation in the above table is that the coefcient we get out is directly
interpretable as the variables contribution to the outcome variable. See the paragraph on linear regression,
above.
Univariate or bivariate regression refers to using a single independent variable to predict the
dependent variable, and therefore gives you unadjusted results. The name is confusing because some
people are referring only to the number of independent variables and others including the outcome
variable in their count. Its unimportant, just know that when you read either univariate or bivariate in the
literature, both refer to an unadjusted model.
Multivariate regression just means youre using more than one independent variable, so the effect
measure (OR, IRR, etc.) of our outcome of interest is adjusted for the other independent variables in the
model.
A regression is a way of predicting the value of one variable (the outcome) based on the value of one or more
other variables (the exposure and confounders). Let's say you're predicting the risk of MI based on
hypertension and family history of MI (using a logistic regression, since MI is a dichotomous outcome). The
result will give you an odds ratio for the effect of hypertension on risk of MI (adjusted for family history) and also
an odds ratio for the effect of family history on risk of MI (adjusted for hypertension).
Its pretty obvious at this point that exposure of interest is just a matter of perspective. Any one of the
independent variables could be considered as the exposure of interest, and the results for each one of them is
adjusted for all the others.
Note that there is no one true model, even for a given data set. If youre looking at the effect of a new drug
on all-cause mortality, you could do logistic regression (where the outcome is dead or not dead) or you could
do a survival analysis (where the outcome is time-to-death). Survival analysis is probably preferable in this
case, since it takes into account the fact that some people will be in the trial for longer than others, but a
logistic regression wouldnt exactly be wrong.
Describe the importance of describing sample characteristics in epidemiologic
research
It allows you to see if the study population is similar to the population that you are working with.
It allows you to identify possible confounders.
71
Describe various ways of selecting which variables should be included in a
multivariate analysis
[A: This isnt an objective, and I think its a little beyond what you need to know for this course, but the
lecturer spent a lot of time on it so Ill discuss it.]
Non-regression statistical tests
[A: These arent in the objectives, but the professor went through them and I think theyre actually
important. Even if you dont focus on them for studying, heres a table that you can refer to later.]
72
Meta-Anal ysi s
(30 Apr 2012)
Dene and compare/contrast review, systematic review, and meta-analysis
Reviews are summaries of the current state of research on a given topic. They come in two basic types:
traditional narrative reviews (which the notes refer to simply as review articles), where an expert in the
eld writes their take on things using whichever sources they like best, and systematic reviews, where
a team systematically searches, reviews, and reports on the current state of the entire body literature. [A:
The course uses review to refer exclusively to narrative reviews; I wont.] Statistically pooling the results
of a bunch of studies is called a meta-analysis, and should only ever be done as a part of a systematic
review.
Narrative reviews are prone to bias, since they depend entirely on the sources, opinions, and views of
the author. They can be as selective as they wish with their references, focussing on a small number of
studies and excluding relevant, valid research. It would be possible, for example, to write a review article
that cherry-picked papers to conclude that smoking cures lung cancer. In order to trust the results of the
review, you must trust the author of the review. Two people writing narrative reviews of the same topic
can arrive at very different conclusions.
Systematic reviews try to minimize the reviewers ability to bias the conclusions of the review by
systematically searching for an summarizing all relevant published research (and sometimes the
unpublished research, too). To achieve this lofty goal, reviewers write out in excruciating detail how they
intend to nd and interpret the relevant literature. Just like real science, systematic reviews have a
methods section that allows anyone to replicate their results. Two people writing systematic reviews of
the same topic should arrive at similar conclusions, assuming that their methods are sound.
A systematic review of narrative review articles found strong bias in the narrative review articles when
compared to the current state of the literature at the time that the article was written (10). Systematic reviews
are the way forward. In Cochrane we trust.
If youre trying to publish original research, some journals now require you to do a systematic review of the
topic to show that your research wasnt a waste of time and money.
A meta-analysis is a statistical pooling of the results of a systematic review. You take all the studies you
found in your systematic review, extract the numerical results as risk ratios, odds ratios, or some other
73
number, and then do a weighted average of them to give a single quantitative summary. The weights
are generally based on the sample size, so that larger studies are given more weight.
Summarize steps required for a systematic review, including framing a specic
question for review
A proper systematic review requires a well-dened research question in the standard PICO format (see
Intro to EBM (9 Jan 2012)). The population, exposure, and outcome will determine the search terms that
you will use.
Summarize steps required for a systematic review, including identifying relevant
literature
You must develop a well-dened search strategy, which includes the databases and sources that will
be searched, the keywords that will be used, and the inclusion and exclusion criteria that you will use to
determine which search results to include in your review.
The following are sources to consider:
MEDLINE database (US-based) and EMBASE database (EU-based)
Clinical trials registers (including Cochranes and the US governments)
foreign language literature
references in primary sources
experts may have access to unpublished material
raw data from trials (by personal communication)
In a cohort study, you have to dene how and who you will recruit into the study. Its exactly the same with a
systematic review, except that your unit of analysis is now research studies instead of people.
Summarize steps required for a systematic review, including assessing the quality
of the literature
Reviewers will often apply standard tools that assess the methodological rigour of the articles that are
being included in the review. These critical appraisal tools should be predened.
Summarize steps required for a systematic review, including summarizing the
evidence
Use tables and graphs, including forest plots, to summarize the results. Provide an conclusion, if
possible, that answers the research question.
Recognize the possible bias due to publication bias and describe approach to
identifying publication bias using a funnel plot
Publication bias is what happens when authors and journals prefer to publish positive (statistically
signicant) results that are interesting. The published research is therefore a biased sample of all
research thats been done.
74
When you do research, some of the positive (statistically signicant) results will be false positives (you only see
an effect because of random chance). If youre only publishing the positive results, youll be publishing a lot of
false positives without the true negatives that would normally balance them out. In the end, treatments and
interventions that dont do anything end up looking effective just because the only studies published are those
that were statistically signicant from random chance alone.
Positive results are more likely to be:
published
published quickly
in English
in more than one journal (where one trial generates a bunch of papers; its how careers are made!)
cited by others
Its more likely to be a problem with smaller trials, since a large and expensive trial is likely to be
published regardless of the outcome.
A few years back, there was a nice analysis of publication bias in trials of antidepressants (11), a hugely
protable market for pharmaceutical companies. They found that 31% of trials on the topic were never
published, and that the published literature was wildly skewed in favour of antidepressants.
A funnel plot is a graphical way to
assess publication bias. It plots study
size, power, or standard error (the
inverse of precision) against their point
estimates. Larger, more precise
studies are higher on the plot than
smaller, less precise studies. We
expect the results of small studies to
vary while the results of larger studies
should converge toward the true
estimate. We expect, therefore, that
the dots on our plot will form a
symmetrical triangle [A: Which, for
some reason, is called a funnel. I guess its like an upside-down funnel.]. If the triangle (or funnel) is
truncated or asymmetrical, its usually because the smaller negative trials were never published. That is,
if the plot doesnt look like a triangle, then theres publication bias.
Standard error is basically the standard deviation. Its proportional to the width of the condence interval, so
that a wide condence interval means large standard error and a narrow condence interval means small
standard error.
Compare the above funnel plot to one that shows publication bias, below:
75
Notice how it no longer has that symmetrical triangle shape, since only the more positive trials are
included. A meta-analysis of these trials will be biased.
Interpret a forest plot
A forest plot is a standard way of summarizing the
numerical results (ORs, RRs, HRs, etc.) of a
systematic review, instead of listing the results in a
boring table. [A: Im pretty sure its name comes from
the expression, cant see the forest for the trees.]
Each study in the review is shown on the plot as a
line with a square in the middle. The line shows the
95% condence interval for the studys result, the
square shows the actual point estimate, and the size
of the square represents the sample size of the study. If a meta-analysis was done, the result of the
meta-analysis will be shown as a diamond at the bottom of the plot, with the width of the diamond
showing its 95% condence interval.
In the above example plot, ve studies are included in the review, ordered by date of publication. The results
are presented as odds ratios. The result of a meta-analysis summarizing the ve studies is also shown as a
diamond with a dashed vertical line at its point estimate. The Ng et al. study is the largest sample, has the
smallest condence interval, and has a point estimate (OR of 2.1) very close to the point estimate of the meta-
analysis that pools all of the studies together (OR of 2.2). Note that all of the studies condence intervals
include the pooled estimate (the dashed line).
Do you remember the proper denition of a condence interval? If a study were repeated an innite number of
times and a condence interval calculated each time, then 95% of the condence intervals we calculate will
include the true value that were trying to estimate. Refresh your memory with Fundamentals of Biostatistics II
(6 Feb 2012). That scenario is pretty close to what were seeing on this forest plot, isnt it? Weve repeated a
study a bunch of times and now were comparing them. Assuming that the studies in our systematic review
are identical repetitions, differing only in their sample of people, then we expect that 95% of the condence
intervals will contain the true value, which were assuming is our pooled estimate. That means that, if the
condence intervals are all over the place and a bunch of them do not include the pooled estimate, then we
76
know somethings wrong. Just by looking at a graphical summary of study results, we can tell whether the
studies were done in similar ways!
Its also important to remember that a meta-analysis is a statistical pooling of the results of a systematic
review, so a forest plot can be used in a systematic review even if they dont meta-analyze the results.
Describe benets and limitations of a meta-analysis
Benets:
Allows quantitative integration of multiple studies, including smaller studies that may have been
inconclusive on their own
Less bias than an unsystematic review
More precise estimates because a larger sample size means more statistical power [A: MOAR
POWER!]
A well-written meta-analysis will be transparent so that you can easily assess its risk of bias
Faster and cheaper than running a large RCT

Limitations:
Its quality is limited by the quality of the individual studies
Publication bias can bias the result, usually in favour of the intervention being studied
Criteria for included studies are critical to understanding the meta-analysis [A: Not really sure what this
means.]
Its often not appropriate to combine studies, such as when theres a lot of heterogeneity (see below)
You must still consider both clinical and statistical signicance
Meta-analyses on the same topic can come to differing conclusions, mostly because of differences
in search strategies and inclusion/exclusion criteria
The results of subsequent large RCTs may differ from the results of a meta-analysis, often due to
publication bias
Meta-analysis provides a quantitative summary of the current state of our knowledge. If the state of our
knowledge sucks, or were getting a biased sample of the state of our knowledge, then our meta-analysis will
suck too. Those are problems with systematic reviews, too. The important thing to remember is that its not
always advisable to statistically pool the results (that is, do a meta-analysis). This issue will be discussed in the
next few objectives.
Dene heterogeneity
Heterogeneity means the studies differ from each other. When reading a meta-analysis, we must think
about clinical, methodological, and statistical heterogeneity.
Heterogeneity may be due to clinical differences in the population, intervention, or outcome.
For example, study location, age and sex of patients, type or dose of medication, and denition of outcome.
It may also be due to methodological differences in the study design, quality, duration, and analysis.
77
For example, cohort studies compared to RCTs, 3-year studies compared to 15-year studies, and intention-
to-treat compared to per-protocol.
Finally, consider the idea of statistical heterogeneity, which is just a fancy statistical way of saying that
the numbers dont add up. That is to say, the numerical results of the studies vary from each other more
than you would expect from chance alone. You can usually see statistical heterogeneity just by looking
at the forest plot. Formal tests also exist for it.
Formal tests include the simple Chi-squared and the fancier I-squared. If statistical heterogeneity exists, it is
inappropriate to statistically pool (i.e. meta-analyze) the study results.
Clinical and methodological heterogeneity means that the studies differ from each other in why, how,
when, and where they were done (without considering the numerical results). Statistical heterogeneity
means that the numerical results differ from each other (without considering whether the studies
themselves).
[A: Personally, I think that statistical heterogeneity is the most important concept here. If theres statistical
heterogeneity, then it can usually be explained by the presence of clinical or methodological heterogeneity. If
theres no statistical heterogeneity, then the clinical or methodological heterogeneity probably doesnt make a
difference.]
Recognize that heterogeneity may mean a meta-analysis is not feasible/valid
If theres heterogeneity, then your studies are not measuring the same thing as each other. If theres
heterogeneity (clinical, methodological, or statistical), then you do not meta-analyze. Assess
differences in the studies PICOs (population, intervention, control/comparison, outcome), assess
differences in the studies methodologies, consider the quantitative variation
If you read a systematic review where they meta-analyzed a bunch of studies without considering
heterogeneity, they did bad. Question their results.
Interpret data from a cumulative meta-analysis
A cumulative meta-analysis is just a different way of making a forest plot, not a different type of meta-
analysis. A forest plot usually orders the studies by date of publication, where the line and square beside
each study name represents the result of that study. In a cumulative meta-analysis, the line and square
instead represent a meta-analysis of all studies published up to that point in time. It allows you to see
how the state of our knowledge has changed over time.
Heres a side-by-side comparison of a normal meta-analysis and cumulative meta-analysis looking at the effect
of streptokinase after acute MI (12):
78
By 1988, 33 studies had been done, most of which were not statistically signicant. But if you pool the results
with a meta-analysis, there was statistically signicant evidence that streptokinase was benecial after the
eighth trial, done in 1973. That means that as many as 24 RCTs were done to answer a research question that
had already been answered! Because no one bothered to meta-analyze the extant statistically non-signicant
trials until 1992, researchers wasted time and money and patients were not given an effective treatment.
Describe the role of a sensitivity analysis
A sensitivity analysis tells you how robust the results of a meta-analysis are to changes in the decisions
and assumptions that were made. Basically, you repeat the meta-analysis a bunch of different ways and
see if the results change.
For example, including results with poor methods, including or excluding outliers, and pooling the studies
using different methods (xed- or random-effects).
79
Communi cati ng Ri sk
(7 May 2012)
Describe effective risk communication as the basis for informed consent
Effective risk communication is the basis for informed consent.
Dene health literacy
Health literacy is the degree to which individuals have the capacity to obtain, process, and
understand basic health information and services needed to make appropriate health decisions,
according to the IOM (13).
Or, the denition that the WHO uses: the cognitive and social skills which determine the motivation and ability
of individuals to gain access to, understand and use information in ways which promote and maintain
good health (14).
Health literacy is worse among the elderly, minorities, and people with low SES. Health literacy can be
measured using a number of tools, including the Test of Functional Health Literacy in Adults (TOFHLA)
Dene health numeracy
The broadest denition of numeracy is the ability to comprehend, use, and attach meaning to
numbers (15). Health numeracy is numeracy applied to health.
I think of health numeracy as the ability to understand and interpret medical numbers (like statistics) and to use
them to make decisions.
An example is understanding what a 10% chance of something means and how many people out of 100
that would correspond to, or knowing whether 2.9 per 1000 is higher or lower risk than 8.2 per 1000. These
seemingly simple statistics can really confuse people, even doctors.
Describe patient perception of risk and the impact of health literacy and
numeracy on patient risk perception and understanding
Low health literacy is associated with: (16)
more hospitalizations
more use of emergency care
reduced screening and vaccination

80
decreased adherence to medical treatment
reduced ability to understand labels and health messages
in the elderly, worse health and higher mortality

People often confuse relative and absolute risks, and dont know when or how to apply them. People
tend to overestimate benets when presented with relative risk reductions.
The way that a problem is framed can impact how a patient, or doctor, understands the information.
Framing the same information in different ways can emphasize the benets (gain) or the costs (loss) of a
given treatment.
The lecture slides have a great example, using breast cancer. Read and try to gure out the following two
scenarios:
(1) The probability that a woman has breast cancer is 0.8%. If she has breast cancer, the probability that a
mammogram will show a positive result is 90%. If a woman does not have breast cancer the probability of a
positive result is 7%. Take, for example, a woman who has a positive result. What is the probability that she
actually has breast cancer?
(2) Eight out of every 1000 women have breast cancer. Of these eight women with breast cancer seven will
have a positive result on mammography. Of the 992 women who do not have breast cancer some 70 will still
have a positive mammogram. Take, for example, a sample of women who have positive mammograms. How
many of these women actually have breast cancer?
Just by reframing the question using natural frequencies instead of probabilities makes it simple to gure out
the right answer.
Describe cognitive biases that affect risk assessment and decision-making [NiO]
Affect: if it feels good, it cant be bad
Anchoring: perceived risk is based on the risk of some other event thats familiar to the patient
Availability bias: if we can think of an example of something, we perceive it as a greater risk
Compression: we overestimate small risks and underestimate large ones
Conrmation bias: we hear things that conrm our suspicions and ignore those that doesnt t with
them
Dread factor: we dread cancer and therefore see risks of cancer as greater
Habituation: everyday or usual activities seem less risky (we dont think about risks of crossing the
street)
Miscalibration: were overly condent about the extent and accuracy of knowledge
Optimism bias: were optimistic about our own outcomes (risk denial)
Probability blindness: anecdotes are more compelling than numbers

Outline the basic dimensions of risk
What is the risk?
Is the risk temporary or permanent?
What is the timing? When is it likely to occur?

81
What is the probability of the risk?
What value does the patient give to the risk? Does the patient perceive it as important?
Identify techniques that have been shown to improve patient understanding of
risk, such as verication techniques and the roles of qualitative and quantitative
and graphic presentations of risk, and decision aids
The basics of risk presentation (mostly from (17)) are:
Provide information in both gain and loss framing
Quantitative information is better than descriptive terms
Natural frequency may be better than percentage (but depends on the patient)
Use a consistent denominator (if the benet is per-1000, then the risks should be, too)
NNT seems to be most difcult conceptually
Absolute risk better than relative risk
Less information is better
Acknowledge uncertainty
Use a teach back technique

Decision aids are pamphlets, videos, or websites that help patients and doctors to understand the risks,
harms, and benets of their healthcare options. The benets of using decision aids (mostly from (18)) are:
Improved knowledge of the options
Better understanding of benets and harms

Decisions are more consistent with patient values
Patients participate more in decision-making
Help patients to consider equally effective non-surgical options

Check out http://www.thennt.com/ for some straightforward risk communication tools. There are a lot of
resources out there, you just have to look. The Cochrane Review articles on risk communication (17) and
decision aids (18) are good places to start.
82
References
1. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine:
what it is and what it isn't. BMJ [Internet]. 1996 Jan. 13;312(7023):712. Available from: http://
www.bmj.com/content/312/7023/71.full
2. Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine:
How to practice and teach EBM. 2nd ed. New York: Churchill Livingstone; 2000.
3. Committee on Quality of Health Care in America. Crossing the Quality Chasm [Internet].
Washington, D.C.: The National Academies Press; 2001. Available from: http://www.nap.edu/
openbook.php?record_id=10027
4. Last JM, editor. A Dictionary of Epidemiology. 4th ed. New York: Oxford University Press; 2000.
5. Gordis L. Epidemiology. Saunders; 2008.
6. Porta M, editor. A Dictionary of Epidemiology. 5th ed. New York: Oxford University Press; 2008.
7. Wang D, Bakhai A. Clinical trials: a practical guide to design, analysis, and reporting. London:
Remedica; 2006.
8. Vrijheid M, Armstrong BK, dard DB, Brown J, Deltour I, Iavarone I, et al. Recall bias in the
assessment of exposure to mobile phones. J Expo Sci Environ Epidemiol. 2009 May 1;19(4):369
81.
9. Hernn MA. The hazards of hazard ratios. Epidemiology. 2010 Jan. 1;21(1):135.
10. Schmidt LM, Gtzsche PC. Of mites and men: reference bias in narrative review articles: a
systematic review. J Fam Pract. 2005 Apr. 1;54(4):3348.
11. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of
antidepressant trials and its inuence on apparent efcacy. N. Engl. J. Med. 2008 Jan. 17;358(3):
25260.
83
12. Lau JJ, Antman EME, Jimenez-Silva JJ, Kupelnick BB, Mosteller FF, Chalmers TCT. Cumulative
meta-analysis of therapeutic trials for myocardial infarction. N. Engl. J. Med. 1992 Jul. 23;327(4):
24854.
13. Institute of Medicine. Health Literacy: A prescription to End Confusion. National Academy Press;
2004.
14. Nutbeam D. The evolving concept of health literacy. Soc Sci Med. 2008 Dec.;67(12):20728.
15. Fischhoff B. Communicating Risks and Benets: An Evidence Based Users Guide: An Evidence
Based Users Guide [Internet]. Brewer NT, Downs JS, editors. Silver Spring: US Department of
Health and Human Services, Food and Drug Administration; 2011. p. 240. Available from: http://
www.fda/gov/ScienceResearch/SpecialTopics/RiskCommunication/default.htm
16. Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K. Low health literacy and health
outcomes: an updated systematic review. Ann. Intern. Med. 2011 Jul. 19;155(2):97107.
17. Akl EA, Oxman AD, Herrin J, Vist GE, Terrenato I, Sperati F, et al. Using alternative statistical
formats for presenting risks and risk reductions. Cochrane Database Syst Rev. 2011 Jan.
1;3:CD0067766.
18. Stacey D, Bennett CL, Barry MJ, Col NF, Eden KB, Holmes-Rovner M, et al. Decision aids for
people facing health treatment or screening decisions. Cochrane Database Syst Rev. 2011 Jan.
1;10:CD0014311.
84
Appendi x I
The Student Gui de to Research
So you scored yourself a sweet research gig? Congratulations! Research can be interesting and even
fun, but it can also be daunting and at times overwhelming. Here are a few pointers and resources
(besides myself) that may help.
Starting your project
Your research supervisor has thrown a few medical-sounding words together and told you to research
it. What do you do next? This section will help you to nail down the specics that youll need in order to
actually do the research, and should apply to any type of epidemiological research youll be doing.
First, nail down the specic research question youll be answering. PICO is your best friend. Break your
research question down into population, intervention or exposure, control or comparison, and outcome.
Dene each one as specically as possible, using denitions that are consistent with the existing
literature. If you cant tell me specically what you mean by second-hand smoke or adolescents,
then you shouldnt be doing the research. Now that you understand exactly what youre researching,
summarize your PICO into a one-sentence question that you can use when telling others about your
research.
Once you have your PICO gured out, think about confounders. What sorts of things might connect
your exposure and outcome, even if theres no causal relationship between the two? Come up with a list
of confounders that you think will signicantly affect your research question, then try to gure out how
you will measure each one. Youll need strict denitions for each of your confounders, just like you do for
your exposure and outcome. Try to make sure that all your variables are dened so that your research
can be compared to the existing research.
With your well-dened variables in mind, think about how you might analyze the data youll get. I dont
expect you to have a specic statistical model set in stone, but you need to have an idea of what kinds
of numbers youll be working with and how they might be made to answer your research question. Is the
outcome dichotomous? How about the exposure and confounders? Are you going to get a risk ratio or
odds ratio out of it, or are you comparing a continuous measure like blood pressure change? If you
arent sure, ask an epidemiologist, or consult with your supervisor (who will probably mumble something
about p-values).
85
Observational studies
Most of the research projects youll be doing are some sort of observational research, where bias and
confounding run rampant. Go back and review sources of bias in case-control and cohort studies. After
that refresher, think about the selection and measurement bias that your study will have, and how you
can minimize it.
Consider how you are choosing who or what you will be including in your study. Is your sampling
method going to give you a good sample of the population youre trying to study, or will it be a biased
sample thats full of rich/poor/healthy/sick/worried/helpful people? Try to gure out the ways that the
study population will fail to reect the population youre actually interested in.
Consider how you are measuring your variables (exposure, outcome, and confounders). Does the status
of any variable affect the way that any other variable is measured? For example, if they have the
outcome, do you search for or measure the exposure differently? And so on, for all your variables.
Collecting the data
Ask your supervisor.
Analyzing the data
Ask your supervisor, or just ask me: aidan@aidanndlater.com.
For simple stuff like means and standard deviations, Excel is ne. For more complicated stuff, youll
need to use statistical software. R is free, student pricing exists for Stata and SPSS, and SPSS and SAS
can be accessed in your web browser through UWOs MyVLab (http://myvlab.uwo.ca/).
Software Notes Resources
SPSS Point-and-click interface
Popular with social scientists
Available at UWO through MyVLab
http://ssnds.uwo.ca/helpnotes/spss.asp
Stata Point-and-click interface
Popular with economists and epidemiologists
Aidans second-favourite!
http://ssnds.uwo.ca/helpnotes/stata.asp
R Its all typing, no point-and-click
Steeper learning curve
Extremely exible if you know basic programming
Free and open source
Aidans favourite!
http://www.r-project.org/
http://www.statmethods.net/
SAS Available at UWO
PROC WILL_DRIVE_YOU_INSANE
Costs more than hiring a biostatistician
Aidan hates it!
http://www.uwo.ca/its/sitelicense/sas/index.html
http://ssnds.uwo.ca/helpnotes/sas.asp
Writing it up
There are standards for reporting for every type of research. Each comes with a checklist to make sure
youre including everything that you should be, and many including templates for recommended ow
charts and tables. Even if your research was awful, theres no reason that your research reporting cant
86
be top-notch! Bear in mind that you might not have room in your article to t in everything on the
checklist. Use them as a guideline rather than an absolute standard.
Go download the checklist that applies to your project:
Study Design Guideline URL
Case-control
STROBE http://www.strobe-statement.org/ Cohort STROBE http://www.strobe-statement.org/
Cross-sectional
STROBE http://www.strobe-statement.org/
Randomized trials CONSORT http://www.consort-statement.org/
Non-randomized trials TREND http://www.cdc.gov/trendstatement/
Diagnostics STARD http://www.stard-statement.org/
Systematic reviews and meta-analyses PRISMA http://www.prisma-statement.org/index.htm
A much more thorough list of reporting standards is maintained by the EQUATOR Network at http://
www.equator-network.org/. If you cant nd an appropriate guideline on there, then it doesnt exist
anywhere.
87

Epidemiology Summary Notes

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Epidemiology Summary Notes

Caricato da

Copyright:

Formati disponibili

Epi demi ol ogy

Course notes for 2012

New information is not always accepted, acted on, or adhered to.

It requires both physicians and patients to prescribe or adhere to the treatment.

Fast, easy, and cheap

You dont need as large a sample (i.e. efcient)

You can study the effects of a number of exposures

There is no loss to follow-up

Bad for rare exposures

Can only study one outcome per study

Limited to odds ratios

Confounding. See the description below under Cohort studies.

The timing of exposure and outcome is better established

Theres less recall and selection bias than case-control studies

It can take a long time to get enough data

Theres a risk of loss to follow-up as people disappear or stop responding

To diagnose a symptomatic patient

To screen for disease in an asymptomatic patient

To provide prognosis in a diagnosed patient

Reliable and precise

Feasible and acceptable

High intra- and inter-rater reliability

The test must change the diagnosis

There must be an effective treatment for the potential diagnosis

Undue stress and anxiety from false positive results

False sense of safety from false negative results

Complications from the follow-up tests, which may be more invasive

Overdiagnosis of disease that would never have become clinically signicant

Less bias than an unsystematic review

Faster and cheaper than running a large RCT

Its quality is limited by the quality of the individual studies

You must still consider both clinical and statistical signicance

more use of emergency care

reduced screening and vaccination

decreased adherence to medical treatment

reduced ability to understand labels and health messages

in the elderly, worse health and higher mortality

Affect: if it feels good, it cant be bad

Availability bias: if we can think of an example of something, we perceive it as a greater risk

Compression: we overestimate small risks and underestimate large ones

Probability blindness: anecdotes are more compelling than numbers

What is the risk?

Is the risk temporary or permanent?

What is the timing? When is it likely to occur?

What is the probability of the risk?

Provide information in both gain and loss framing

Quantitative information is better than descriptive terms

Absolute risk better than relative risk

Less information is better

Use a teach back technique

Better understanding of benets and harms

Help patients to consider equally effective non-surgical options

Potrebbero piacerti anche