Sei sulla pagina 1di 39

203.

343
Advanced Genetics
and Genomics
Lecture 8
August 6th 2015
Olin Silander

GWAS: methods and statistics


Tag SNPs

Select the minimum number of


SNPs that retain the full genetic
variation of the SNP data set

Nat Rev Gene 7, 781-791 (2006)

GWAS: methods and statistics

First: Test for HWE


positive results signifies
(1) Inbreeding
(2) Selection
(3) Poor data quality
(4) Population stratification

GWAS: methods and statistics


Case control association:
Calculate the fraction of
cases among the three
genotype classes
genotype genotype genotype
AA
Aa
aa

genotype genotype genotype


AA
Aa
aa

Total

Total

control

180

200

20

400

control

172

188

40

400

case

250

270

80

600

case

258

282

60

600

Total

430

470

100

Total

430

470

100

genotype genotype genotype


AA
Aa
aa
control

0.37

0.76

10

case

0.25

0.51

6.67

p-value = 1 e-4

GWAS: methods and statistics


Binary (discrete) associations
Proportion trend test (Armitage test):
Calculate the slope of genotype vs
case number as proportion of total

0.8
0.6
0.4
0.2
0.0

case / (case+control)

1.0

if slope is not different


from 0, no association

AA

Aa
Genotype

p-value = 5e-3

aa

GWAS: methods and statistics


Binary (discrete) associations

Proportion trend test (Armitage test):


Calculate the slope of genotype vs
case number as proportion of total
Case control association:
Calculate the fraction of cases
among the three genotype classes
One is not better than the other

GWAS: methods and statistics


Continuous associations
ANOVA or linear regression

GWAS: problems with the statistics

The genome is large

Multiple testing
Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction
=
/n

GWAS: problems with the statistics


Multiple testing Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction
=
/n

GWAS: problems with the statistics


Multiple testing Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction
=
/n
Problem: what is the
distribution of p-values when
the null hypothesis is true?

GWAS: problems with the statistics


Problem: what is the
distribution of p-values when
the null hypothesis is true?
Test whether population
mean is different from 0
H0: = 0
H1: != 0
(1) Draw n random normal
variables with mean = 0

Murdoch and Adcock (2008) J. Am. Stat.

GWAS: problems with the statistics


Problem: what is the
distribution of p-values when
the null hypothesis is true?
Test whether population
mean is different from 0
H0: = 0
H1: != 0
(1) Draw n random normal
variables with mean = 0
programming break!

Murdoch and Adcock (2008) J. Am. Stat.

GWAS: problems with the statistics

GWAS: problems with the statistics


Multiple testing Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction (FWER)
=
/n
Guards against having any false positives

GWAS: problems with the statistics


Multiple testing Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction (FWER)
=
/n
Guards against having any false positives
Solution 2: Permutation test
Randomise cases and controls new data has same
LD structure as the old, but there is no association
Repeat many times to find the false positive rate

GWAS: problems with the statistics


Multiple testing Type 1 error:
rejecting the null hypothesis
when the null is true
Solution 1: Bonferroni correction (FWER)
Solution 2: Permutation test
Solution 3: False discovery rate (FDR)
(e.g. Benjamini-Hochberg)
Controls the proportion of false
positives you will accept (e.g. 5%)

GWAS: methods and statistics

Study power

GWAS: methods and statistics

Study power

programming break!

GWAS: methods and statistics

Study power:
The ability to detect an effect, if that effect exists
Determined by:
(1) Sample size
(2) Size of the effect
(3) Statistical test (FDR, FWER)

GWAS: methods and statistics

Study power:
The ability to detect an effect, if that effect exists
Determined by:
(1) Sample size
(2) Size of the effect
(3) Statistical test (FDR, FWER)

FDR (Benjamini-Hochberg)
I want to be sure I find all true
effects, even if many I find are false

FWER (Bonferroni)
I want to be sure all the
effects I find are true effects

GWAS: methods and statistics

Study power:
The ability to detect an effect, if that effect exists
Determined by:
(1) Sample size
(2) Size of the effect
(3) Statistical test (FDR, FWER)

FDR (Benjamini-Hochberg)
I want to be sure I find all true
effects, even if many I find are false

FWER (Bonferroni)
I want to be sure all the
effects I find are true effects

more power

less power

GWAS: representing the data

Nature 447, 661-678(2 007)

GWAS: representing the data


why are there
groups of SNPs?

Nature 447, 661-678 (2007)

GWAS: representing the data


Coronary artery disease
Crohns disease
Hypertension

Q-Q plot

Rheumatoid arthritis

Coronary artery
disease

Hypertension

Crohns
disease

Rheumatoid
arthritis

GWAS: representing the data

Is there a
problem here?

Science (2010)

GWAS: problems with the data


Problem: population stratification

Population 1 has a disproportionate


number of case samples

Nat Rev Gene 7, 781-791 (2006)

GWAS: problems with the data


Problem: population stratification

Population 1 has a disproportionate


number of case samples
any SNPs that differentiate the populations
will be associated with the disease
Nat Rev Gene 7, 781-791 (2006)

GWAS: results (relative risk)

RR is the probability of the disease


(phenotype) if you have a certain genotype
relative to the probability if you dont have it

GWAS: results (relative risk)

RR is the probability of the disease


(phenotype) if you have a certain genotype
relative to the probability if you dont have it

genotype
AA

genotype
Aa

genotype
aa

control

180

200

20

case

250

270

80

GWAS: results (relative risk)

RR is the probability of the disease


(phenotype) if you have a certain
genotype relative to the probability if
you dont have that genotype

control

genotype
AA

genotype
Aa

genotype
aa

180

200

20

probability of disease
if you have aa
80
(80+20)
(250+270)
(250+270+180+200)

case

250

270

80

probability of disease
if you have Aa or AA

GWAS: results (relative risk)

RR is the probability of the disease


(phenotype) if you have a certain
genotype relative to the probability if
you dont have that genotype

genotype
AA

genotype
Aa

probability of disease
if you have aa

genotype
aa

0.8
control

180

200

20

0.58
case

250

270

80

probability of disease
if you have Aa or AA

GWAS: results (explained variance)

Explained variance: how much variation in the phenotype


(e.g. disease) is explained by genotype (e.g. SNPs)

GWAS: results (explained variance)

3.8%

54 genomic SNPs

40%

Galton (19th century)

Explained variance: how much variation in the phenotype


(e.g. disease) is explained by genotype (e.g. SNPs)

GWAS: results (explained variance)

54 genomic
SNPs

Galton (19th
century)

GWAS: causal variants

Bush and Moore PLOS Comp Bio (2012)

GWAS: success or failure?

Bush and Moore PLOS Comp Bio (2012)

GWAS: success or failure?


linkage analysis

linkage analysis

GWAS

??

Bush and Moore PLOS Comp Bio (2012)

GWAS: success or failure?

GWAS only finds common variants

Bush and Moore PLOS Comp Bio (2012)

Potrebbero piacerti anche