Sei sulla pagina 1di 30

Alleles and Hardy-Weinberg Equilibrium

Yifei Huang

Department of Biology and Huck Institutes

Jan 15, 2020

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 1 / 24


Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 2 / 24
Why oh why didn’t I take the blue pill

In population genetics there is no blue pill.

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 3 / 24


What is evolution?

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 4 / 24


What is evolution?

Evolution: Descent with modification


Population genetics: Genetic changes in populations of organisms
over time

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 4 / 24


The  process  of  descent  

Modified  from  
Baum  and  Smith  
Tree-­‐Thinking  
book    

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 5 / 24


Descent  with  modifica'on  
Genera'on  1   ATCCGGAAA  

Muta'on  from  G-­‐>A  posi'on  6.  


Creates  a  polymorphism    
G/A  in  popula'on  
 
ATCCGAAAA  

Modified  from  
Baum  and  Smith  
Tree-­‐Thinking  
book    

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 6 / 24


Descent  with  modifica'on   ATCCGGAAA  
Genera'on  1  

Gen.  2  

ATCCGGAAA  
ATCCGGAAA  

Gen.  10  

Modified  from  
Baum  and  Smith  
Tree-­‐Thinking  
Genera'on  15   book    
ATCCGAAAA  

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 7 / 24


ATCCGGAAA  

G-­‐>A  Pos.  6  

ATCCGAAAA   ATCCGAAAA  

Modified  from  
Baum  and  Smith  
A:  ATCCGAAAA   Tree-­‐Thinking  
B:  ATCCGAAAA   book    
C:  ATCCGAAAA  
D:  ATCCGAAAA  
 
Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 8 / 24
ATCCGGAAAC  

G-­‐>A  Pos.  6  

A-­‐>T  Pos.  8  

C-­‐>A  Pos.  3  
A-­‐>G  Pos.  1  

Modified  from  
Baum  and  Smith  
A:  ATCCGAATA   Tree-­‐Thinking  
B:  ATACGAATA   book    
C:  ATCCGAAAA  
D:  GTCCGAAAA  
 
Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 9 / 24
• DNA  sequencing  

Variants  in  the  Adh  gene  of    


D.  Melanogaster  
Kreitman  1983  

• Provides  unbiased  descrip'on    


of  gene'c  varia'on  
• Coding,  noncoding  
• Indels  

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 10 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

LOCUS  (plural  loci)  


An  allele  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 11 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

Each  diploid  individual’s  genotype  consists  of  2  haplotypes  (orthologous  sequences)    

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 12 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

Each  diploid  individual’s  genotype  consists  of  2  haplotypes  


At  each  locus/posi'on  individuals  are  homozygous  or  heterozygous    

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 13 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

Each  diploid  individual’s  genotype  consists  of  2  haplotypes  


At  each  posi'on  homozygous/heterozygous    

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 14 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  

Species  1  
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
Species  2   ATG CGG CGT ATT TCG CAT TTA GGA CAT GTA TTC ACG GCT TAT!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 15 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  


Nonsyn Syn Syn Syn Syn
Species  1  
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
ATG CGG CGT ATT TCG CAT TTA GGA CAT GTA TTC ACG GCT TAT!
Species  2  
M Q/R R I S H L G R/L V P S A Y/D!

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 16 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
Four  simple  summaries  of  polymorphism:  
The  frequency  of  each  site.  
Number  of  segrega6ng  sites.    
Heterozygosity:  Frac'on  of  all  sites  where  an  individual  is  heterozygous  
Pairwise  Diversity:    
Four simple summaries of polymorphism:
Frac'on  of  sites  that  differ  between  two  sequences  chosen  at  random    

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 17 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
Four  simple  summaries  of  polymorphism:  
The  frequency  of  each  site.  
Number  of  segrega6ng  sites.    
Heterozygosity:  Frac'on  of  all  sites  where  an  individual  is  heterozygous  
Pairwise  Diversity:    
Four simple summaries of polymorphism:
Frac'on  of  sites  that  differ  between  two  sequences  chosen  at  random    
Allele frequency

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 17 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
Four  simple  summaries  of  polymorphism:  
The  frequency  of  each  site.  
Number  of  segrega6ng  sites.    
Heterozygosity:  Frac'on  of  all  sites  where  an  individual  is  heterozygous  
Pairwise  Diversity:    
Four simple summaries of polymorphism:
Frac'on  of  sites  that  differ  between  two  sequences  chosen  at  random    
Allele frequency
Number of segregating sites

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 17 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Aligned  orthologous  sequence  across  individuals  
Sequence data

A  sample  from  a  popula'on  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT!
Four  simple  summaries  of  polymorphism:  
The  frequency  of  each  site.  
Number  of  segrega6ng  sites.    
Heterozygosity:  Frac'on  of  all  sites  where  an  individual  is  heterozygous  
Pairwise  Diversity:    
Four simple summaries of polymorphism:
Frac'on  of  sites  that  differ  between  two  sequences  chosen  at  random    
Allele frequency
Number of segregating sites
Nucleotide (pairwise) diversity (π): Fraction of sites that differ
between two sequences chosen at random

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 17 / 24


Basic  currency  of  modern  popula'on  gene'cs  
Site frequency spectrum
Aligned  orthologous  sequence  across  individuals   2.0
Sequence data

Number of alleles
1.5
A  sample  from  a  popula'on  

ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT! 1.0
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT TAT!
0.5
ATG CAG CGT ATT TCA CAT TTG GGA CTT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGC ATT TCA CAT TTG GGA CAT GTA TTT ACG GCT GAT!
ATG CAG CGT ATT TCA CAT TTG GGA CAT GTA TTT ACG GCC TAT! 0.0

Four  simple  summaries  of  polymorphism:   1 2 3 4


The  frequency  of  each  site.   Minor allele frequency
Number  of  segrega6ng  sites.    
Heterozygosity:  Frac'on  of  all  sites  where  an  individual  is  heterozygous  
Pairwise  Diversity:    
Four simple summaries of polymorphism:
Frac'on  of  sites  that  differ  between  two  sequences  chosen  at  random    
Allele frequency
Number of segregating sites
Nucleotide (pairwise) diversity (π): Fraction of sites that differ
between two sequences chosen at random
Site-frequency spectrum
Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 17 / 24
average number of pairwise differences in %, log10 scale How  much  DNA  varia'on  is  there?  
8
5

0.5

0.1

0.05
• Drosophila  melanogaster  -­‐  1%   Apicomplexa (2) Ciliophora(12)
Ascomycota (6) Arthropoda (64)
diversity   Chordata (61) Echinodermata (3)
Nematoda (3) Chlorophyta (2)
0.01 Pinophyta (9) Porifera (1)
• Humans  -­‐  0.1%  diversity   Magnoliophyta (13) Mollusca (1)
Heterokontophyta (3) Mammals (42)
• 1  out  of  every  1000  bases   Basidiomycota (1) Drosophila (22)
variable  
Species 181

Leffer  et  al  PLOS  Biology  

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 18 / 24


Hardy–Weinberg proportions: calculation

Assume a locus has two alleles, A and a, and the frequencies of the
two alleles are p and q, respectively.
Hardy–Weinberg equilibrium: The two alleles in an individual are
inherited “independently” from each parent.
The frequencies of the tree genotypes:
Genotype Frequency
AA p2
Aa 2pq
aa q2

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 19 / 24


Hardy-Weinberg equilibrium: assumptions

Mating is random
No natural selection (all progeny are equally fit)
No mutation that could change an A to a or an a to A
It is a single population that is very large (no genetic drift, no
migration)

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 20 / 24


Hardy-Weinberg proportions in humans

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 21 / 24


Applications of Hardy-Weinberg: Forensic DNA profiling

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 22 / 24


Allele frequencies for microsatellites commonly from 196 US Caucasian samples

Genotype from a forensic case Question: Assume that you have a


suspect and they are innocent, what
is the probability that their genotype
would match this profile by chance?

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 23 / 24


Allele frequencies for microsatellites commonly from 196 US Caucasian samples

P(D3S1358) = 2 × 0.2118 × 0.1626


= 0.0689
Genotype from a forensic case
P(D21S11) = 2 × 0.1811 × 0.2321
= 0.0841

2
P(D18S51) = 0.0918
= 0.0084

P(match by chance) = 0.0689 × 0.0841 × 0.0084 = 0.000049

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 23 / 24


Hardy & Weinberg

Huang, Y.-F. (Penn State) BIOL428 Jan 15, 2020 24 / 24