Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA
Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA
Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA
Ebook1,339 pages15 hours

Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA

Rating: 0 out of 5 stars

()

Read preview

About this ebook

In the field of forensics, there is a critical need for genetic tests that can function in a predictive or inferential sense, before suspects have been identified, and/or for crimes for which DNA evidence exists but eye-witnesses do not. Molecular Photofitting fills this need by describing the process of generating a physical description of an individual from the analysis of his or her DNA. The molecular photofitting process has been used to assist with the identification of remains and to guide criminal investigations toward certain individuals within the sphere of prior suspects.

Molecular Photofitting provides an accessible roadmap for both the forensic scientist hoping to make use of the new tests becoming available, and for the human genetic researcher working to discover the panels of markers that comprise these tests. By implementing population structure as a practical forensics and clinical genomics tool, Molecular Photofitting serves to redefine the way science and history look at ancestry and genetics, and shows how these tools can be used to maximize the efficacy of our criminal justice system.

  • Explains how physical descriptions of individuals can be generated using only their DNA
  • Contains case studies that show how this new forensic technology is used in practical application
  • Includes over 100 diagrams, tables, and photos to illustrate and outline complex concepts
LanguageEnglish
Release dateJul 19, 2010
ISBN9780080551371
Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA

Related to Molecular Photofitting

Related ebooks

Law For You

View More

Related articles

Related categories

Reviews for Molecular Photofitting

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Molecular Photofitting - Tony Frudakis Ph.D.

    Molecular Photofitting

    Predicting Ancestry and Phenotype Using DNA

    First Edition

    Tony N. Frudakis

    With a Chapter 1 Introduction by Mark D. Shriver

    Amsterdam • Boston • Heidelberg • London

    New York • Oxford • Paris • San Diego

    San Francisco • Singapore • Sydney • Tokyo

    Academic Press is an imprint of Elsevier

    Table of Contents

    Cover image

    Title page

    Copyright page

    Foreword

    Preface

    Acknowledgments

    Chapter 1: Forensic DNA Analysis: From Modest Beginnings to Molecular Photofitting, Genics, Genetics, Genomics, and the Pertinent Population Genetics Principles

    PART I: INTRODUCTION: BRIEF HISTORY OF DNA IN FORENSIC SCIENCES

    THE STATISTICS OF FORENSIC DNA ANALYSES

    THE NATURE OF HUMAN GENETIC VARIATION

    POPULATION GENETICS AND POPULATION GENOMICS

    THE PROMISE OF MOLECULAR PHOTOFITTING AS A TOOL IN FORENSIC SCIENCE

    PART II: THE BASIC PRINCIPLES

    Chapter 2: Ancestry and Admixture

    WHAT ARE ANCESTRY AND ADMIXTURE?

    THE NEED FOR MOLECULAR TESTS FOR ANCESTRY

    ANCESTRY INFORMATIVE MARKERS

    BIOGEOGRAPHICAL ANCESTRY ADMIXTURE AS A TOOL FOR FORENSICS AND PHYSICAL PROFILING

    Chapter 3: Biogeographical Ancestry Admixture Estimation—Theoretical Considerations

    ESTIMATING BY ANTHROPOMETRIC TRAIT VALUE

    ADMIXTURE AND GENE FLOW ESTIMATED FROM SINGLE LOCI

    ADMIXTURE IN INDIVIDUAL SAMPLES

    USING THE HANIS METHOD ON POPULATION MODELS K > 2

    PARAMETER UNCERTAINTY

    BAYESIAN METHODS FOR ACCOMMODATING PARAMETER UNCERTAINTY

    SAMPLING ERROR

    ASSUMPTIONS ABOUT MARKER LINKAGE AND INTENSITY OF ADMIXTURE IN PARENTS

    PRITCHARD’S STRUCTURE PROGRAM

    IN DEFENSE OF A SIMPLE ADMIXTURE MODEL

    PRACTICAL CONSIDERATIONS FOR BUILDING AN ANCESTRY ADMIXTURE TEST

    SELECTING AIMS FROM THE GENOME—HOW MANY ARE NEEDED?

    COMPARING THE POWER OF SPECIFIC LOCI FOR SPECIFIC RESOLUTIONS

    GENOMIC COVERAGE OF AIMs

    MORE ELABORATE METHODS OF SELECTING MARKERS FOR INFORMATION CONTENT

    SHANNON INFORMATION

    FISCHERIAN INFORMATION CONTENT

    INFORMATIVENESS FOR ASSIGNMENT

    TYPE OF POLYMORPHISMS

    INTERPRETATION OF ANCESTRY ESTIMATES

    OBJECTIVE INTERPRETATION

    GENETIC MAPPING AND ADMIXTURE

    APPENDIX

    Chapter 4: Biogeographical Ancestry Admixture Estimation—Practicality and Application

    THE DISTRIBUTION OF HUMAN GENETIC VARIABILITY AND CHOICE OF POPULATION MODEL

    MARKER SELECTION

    SAMPLE COLLECTION

    PRESENTING INDIVIDUAL BIOGEOGRAPHICAL ANCESTRY ADMIXTURE (BGAA) RESULTS

    CONCEPTUAL ISSUES

    Chapter 5: Characterizing Admixture Panels

    PARENTAL SAMPLE PLOTS

    MODEL CHOICES AND DIMENSIONALITY

    SIZE OF CONFIDENCE CONTOURS

    REPEATABILITY

    SENSITIVITY

    ANALYSIS OF RESULTS FOR GENEALOGISTS

    ANALYSIS OF RESULTS FOR NONGENEALOGISTS

    BLIND CHALLENGE OF CONCORDANCE WITH SELF-ASSESSED RACE

    CONFIDENCE INTERVAL WARPING

    SAMPLED PEDIGREES

    SIMULATED PEDIGREES

    COMPARING DIFFERENT ALGORITHMS WITH THE SAME AIM PANEL

    ANALYSIS USING SUBSETS OF MARKERS

    RESOLVING SAMPLE MIXTURES

    SAMPLE QUANTITY

    NONHUMAN DNA

    PERFORMANCE WITH ALTERED PARENTAL ALLELE FREQUENCIES

    CORRELATION WITH ANTHROPOMETRIC PHENOTYPES

    SIMULATIONS

    CREATING SIMULATED SAMPLES

    SOURCE OF ERROR MEASURED WITH SIMULATIONS

    RELATIONSHIP BETWEEN ERROR IN POPULATIONS AND WITHIN INDIVIDUALS

    PRECISION OF THE 71 AIM PANEL FROM SIMULATIONS

    TRENDS IN BIAS FROM THE 71 AIM PANEL

    95% CONFIDENCE THRESHOLD FOR 71 AIM PANEL

    PRECISION OF THE 171 AIM PANEL FROM SIMULATIONS

    MLE THRESHOLDS FOR ASSUMPTION OF BONA FIDE AFFILIATION

    COMPARISON OF 71 AND 171 AIM PANELS

    OBSERVED AND EXPECTED BIAS

    WHAT DO THE SIMULATIONS TEACH US ABOUT INTERPRETING BGA ADMIXTURE RESULTS?

    BIAS SYMMETRY

    IMPACT OF MLE ALGORITHM DIMENSIONALITY

    SIMULATIONS OF ADMIXED INDIVIDUALS

    MLE PRECISION FROM THE TRIANGLE PLOTS

    CONFIDENCE OF NONZERO AFFILIATION

    STANDARD DEVIATION FROM CONFIDENCE INTERVALS

    TESTING THE RELATION BETWEEN CONFIDENCE MEASURES IN INDIVIDUALS AND POPULATIONS

    SPACE OUTSIDE THE TRIANGLE PLOT

    COMBINED SOURCES SUGGEST AN AVERAGE ERROR

    Chapter 6: Apportionment of Autosomal Diversity with Continental Markers

    THE NEED FOR POPULATION DATABASES—WORDS MEAN LESS THAN DATA

    TRENDS ON AN ETHNIC LEVEL: AUTOSOMAL VERSUS SEX CHROMOSOME PATTERN

    WHAT DO CONTINENTAL ANCESTRY AIMS SAY ABOUT ETHNICITY?

    THE SIGNIFICANCE OF FRACTIONAL AFFILIATION RESULTS ON A POPULATION LEVEL

    RECONSTRUCTING HUMAN HISTORIES FROM AUTOSOMAL ADMIXTURE RESULTS

    SHARED RECENT ANCESTRY VERSUS ADMIXTURE: WHAT DOES FRACTIONAL CONTINENTAL AFFILIATION FOR AN ETHNIC GROUP MEAN?

    RETURNING BRIEFLY TO THE NAMING PROBLEM — RELEVANCE FOR INTERPRETING THE APPORTIONMENT OF AUTOSOMAL DIVERSITY

    A SAMPLING OF ETHNICITIES USING THE 171 AIM PANEL

    INTERPRETATION OF ANCESTRY PROFILES FOR ETHNIC POPULATIONS

    EAST ASIAN ADMIXTURE IN THE MIDDLE EAST AND SOUTH ASIA

    RESOLUTION WITHIN CONTINENTS BASED ON THE FOUR-POPULATION MODEL

    INTERPRETATION OF CONTINENTAL BGA RESULTS IN LIGHT OF WHAT WE HAVE LEARNED FROM APPLICATION TO ETHNIC POPULATIONS

    APPROPRIATENESS OF A FOUR-POPULATION MODEL

    DO ALLELE FREQUENCY ESTIMATION ERRORS ACCOUNT FOR THE SECONDARY AFFILIATIONS IN ETHNIC SUBPOPULATIONS?

    INDICATIONS OF CRYPTIC POPULATION STRUCTURE

    Chapter 7: Apportionment of Autosomal Diversity with Subcontinental Markers

    SUBPOPULATION AIMS AND ETHNIC STRATIFICATION

    WITHIN THE EUROPEAN BGA GROUP — A BRIEF HISTORY OF EUROPEANS

    HOW DO WE SUBDIVIDE EUROPEANS FOR FORENSICS USE?

    DEVELOPMENT OF A WITHIN-EUROPEAN AIM PANEL

    THE EURO 1.0 AIM PANEL FOR A FOUR-POPULATION SUBCONTINENTAL MODEL

    ESTABLISHING THE OPTIMAL PARENTAL REPRESENTATIVES

    BLIND CHALLENGE WITH ETHNICALLY ADMIXED EUROPEAN-AMERICAN SAMPLES

    POPULATION ISOLATES AND TRANSPLANTS

    CORRELATIONS WITH ANTHROPOMETRIC TRAITS

    TEST ERROR

    HIERARCHICAL NATURE OF EURO 1.0—PRIOR INFORMATION REQUIRED

    EURO 1.0 PEDIGREES AS AN AID TO INTERPRETING RESULTS

    EURO 1.0 — INTERPRETATION OF VARIATION WITHIN GROUPS

    AN HISTORICAL PERSPECTIVE

    MORE DETAILED SUBPOPULATION STRATIFICATIONS — k = 7

    WHAT DO THE GROUPS NOR1, NOR2 … MEAN?

    EVALUATING THE RESULTS FROM THE k = 7 EUROPEAN MODEL

    COMPARISON WITH PREVIOUS STUDIES BASED ON GENE MARKERS

    COMPARISON WITH RESULTS FROM OTHER STUDIES

    BLIND CHALLENGE OF THE k = 7 MODEL RESULTS WITH ETHNIC SAMPLES

    CORRELATIONS WITH ANTHROPOMETRIC TRAITS

    PEDIGREES

    SUBSTANTIAL VARIATION IN ADMIXTURE WITHIN ETHNIC GROUPS

    ALTERNATIVE STYLES FOR ESTIMATING ETHNIC ADMIXTURE

    Chapter 8: Indirect methods for Phenotype Inference

    ESTIMATES OF GENOMIC ANCESTRY ALLOWS FOR INFERENCE OF CERTAIN PHENOTYPES

    PHENOTYPE VARIATION AS A FUNCTION OF HUMAN POPULATION HISTORY AND INDIVIDUAL ANCESTRY

    SOURCES OF PHENOTYPIC VARIATION

    EMPIRICAL OBSERVATION OF ADMIXTURE-BASED CORRELATION ENABLES GENERALIZATION

    EMPIRICISM AS A TOOL FOR THE INDIRECT METHOD OF MOLECULAR PHOTOFITTING

    REVERSE FACIAL RECOGNITION USING GENOMIC ANCESTRY ESTIMATES

    ESTIMATING PHENOTYPE FROM 2D DIGITAL PHOTOGRAPHS

    ESTIMATING PHENOTYPES FROM 3D DIGITAL PHOTOGRAPHS

    EXAMPLES OF DATABASE QUERIES — GLOBAL CHARACTERISTICS FROM DIGITAL PHOTOGRAPHS

    EXAMPLES OF DATABASE QUERIES—ETHNIC DESCRIPTORS AND GEOPOLITICAL AFFILIATIONS

    VARIATION AND PARAMETERIZATION OF DATABASE OBSERVATIONS

    CAN SOCIAL CONSTRUCT SUCH AS RACE BE INFERRED FROM DNA?

    INDIRECT APPROACH USING FINER POPULATION MODELS

    INDIRECT INFERENCE OF SKIN PIGMENTATION

    SOURCES OF THE ANCESTRY-SKIN PIGMENTATION CORRELATION

    CAN WE INFER M KNOWING GENOMIC ANCESTRY?

    INFERENCES ON COMPOSITE CHARACTERISTICS

    WHY NOT USE THE DIRECT METHOD INSTEAD?

    INDIRECT INFERENCE OF IRIS PIGMENTATION

    Chapter 9: Direct Method of Phenotype Inference

    PIGMENTATION

    HISTORY OF PIGMENTATION RESEARCH

    THE GENETICS OF HUMAN PIGMENTATION—A COMPLEX PUZZLE

    BIOCHEMICAL METHODS OF QUANTIFYING PIGMENT

    IRIS COLOR

    IRIS COLOR PHENOTYPING: THE NEED FOR A THOUGHTFUL APPROACH

    MAKING IRIS COLOR MEASUREMENTS

    POPULATION SURVEYS OF IRIS MELANIN INDEX (IMI) VALUES

    RELATION OF IMI TO SELF-DESCRIBED IRIS COLOR

    HISTORY OF GENETIC RESEARCH ON IRIS COLOR

    RECENT HISTORY OF ASSOCIATION MAPPING RESULTS

    OCA2—THE PRIMARY IRIS COLOR GENE

    AN EMPIRICAL OCA2-BASED CLASSIFIER FOR THE INFERENCE OF IRIS COLOR

    THE EMPIRICAL METHOD OF DIRECT PHENOTYPE INFERENCE

    CASE REPORTS

    HAIR COLOR

    SKIN PIGMENTATION

    FINAL CONSIDERATIONS FOR THE DIRECT INFERENCE OF SKIN PIGMENTATION

    Chapter 10: The first case studies of molecular photofitting

    CASE REPORTS

    LOUISIANA SERIAL KILLER MULTIAGENCY HOMICIDE TASK FORCE INVESTIGATION

    OPERATION MINSTEAD

    THE BOULDER, COLORADO CHASE CASE

    OTHER CASES

    Chapter 11: The Politics and Ethics of Genetic Ancestry Testing

    RESISTANCE

    ARTICLES — INSIGHT INTO PUBLIC REACTION

    MOLECULAR EYEWITNESS: DNA GETS A HUMAN FACE

    DNA TESTS OFFER CLUES TO SUSPECT’S RACE

    CONCERNS OF THE DEFENSE-MINDED

    CONCERNS OF THE PROSECUTION-MINDED

    RESISTANCE IN THE SCIENTIFIC COMMUNITY

    THE NONMEDICAL APPLICATION OF DATA DEVELOPED FOR MEDICAL REASONS IS UNETHICAL

    THE TESTS ATTEMPT TO PROVIDE A DEFINITIVE ANSWER FOR WHAT CANNOT BE DEFINITIVELY DETERMINED

    THE TESTS OVERPROMISE WHAT THEY CAN DO, DESTROYING PUBLIC TRUST OF GENETICS TESTS, POTENTIALLY AN OMINOUS DEVELOPMENT ON THE EVE OF PERSONALIZED MEDICINE THAT WILL USE SUCH TESTS

    BiDil

    DNA IS DIFFERENT

    RACISM AND GENETIC ANCESTRY TESTING

    RACISM AND THE COMMON RACIST MANTRA

    THE DATA DOES NOT AND PROBABLY CANNOT SUPPORT THE RACIST VIEWPOINT

    SUBJECTIVE NATURE OF THE WORD INTELLIGENCE

    ACCORDING TO NATURE, DIVERSITY IS A GOOD THING

    Bibliography

    Index

    Copyright

    Acquisitions Editor: Jennifer Soucy

    Assoc. Developmental Editor: Kelly Weaver

    Project Manager: Christie Jozwiak

    Publishing Services Manager: Sarah Hajduk

    Cover Designer: Alisa Andreola

    Composition: SPi

    Printer: China Translation & Printing Services, Ltd.

    Academic Press is an imprint of Elsevier

    30 Corporate Drive, Suite 400, Burlington, MA 01803, USA

    525 B Street, Suite 1900, San Diego, California 92101-4495, USA

    84 Theobald’s Road, London WC1X 8RR, UK

    .

    Copyright © 2008, Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.

    Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+ 44) 1865 843830, fax: (+ 44) 1865 853333, E-mail: permissions@elsevier.com. You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting Support & Contact then Copyright and Permission and then Obtaining Permissions.

    Library of Congress Cataloging-in-Publication Data

    Application submitted.

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library.

    ISBN: 978-0-12-088492-6

    For information on all Academic Press publications visit our Web site at www.books.elsevier.com

    Printed in China

    07 08 09 10  9 8 7 6 5 4 3 2 1

    Foreword

    Richard A. Sturm, PhD, Melanogenix Group, Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld, 4072, Australia

    There is a deep desire within us all to find out who we are as individuals by tracing our ancestors through history, asking where they may have come from and what they may have looked like. Only over a relatively short period of time—at best several generations—can most of us follow our personal genealogies, using family trees drawn from oral histories or public records, before we are quickly lost in the depths of inaccessible ancestors. Another approach, only available to us in recent years, is to peer into our genes to examine the DNA record encoded in the human genome. This can provide a wealth of information about our family ties, the level of relatedness within and between populations, and ultimately even the origin of the human lineage. The linking of our shared genetic ancestries with the geographical distribution of prototypical human populations is one of the keys to finding our own affiliations as well as the distribution of physical traits within present day admixed populations. The concept of Biogeographical Ancestry (BGA) is the term comprehensively defined by Tony Frudakis in this landmark reference work, with the motivation being to correlate ancestry and sequence differences within our DNA to an individual’s physical appearance. This whole process is referred to as molecular photofitting, with downstream applications for forensic identification purposes.

    The considerable effort expended in characterizing the frequency distribution of single nucleotide changes within human populations has rewarded the DNAPrint Genomics team with a unique set of Ancestry Informative Markers (AIMS). With these tools and a noticeably pragmatic approach, a detailed description is given of the theoretical basis for choosing a model with four main ancestral continental groups (West African, European, East Asian, and Indigenous American). These are the geographical extremes that can be used to plot the admixture of a present day individual. They also allow indirect methods to predict physical traits such as the degree of pigmentation present within the hair, eye, and skin based on the primordial characteristics of these groups. While not definitive, this is a clear improvement on the current inaccurate means of inferring physical appearance from a DNA sample. Although the DNAWitness™ protocol is currently operational as a molecular photofitting test, the future is in directly correlating physical phenotypes with the genes that are part of the biochemical process determining or modifying these characteristics. The complement of human pigmentation genes is presently being characterized, and this book contains a good description of the polymorphisms that will direct the color traits of hair, eye, and skin. This is a major advancement of a fledgling field; the future will surely be based on an individual’s genotype at specific loci. A glimpse of this is seen with correlative genes such as MC1R with hair, OCA2 with eye, and MATP with skin color.

    The use of DNA fingerprinting systems such as CODIS are now well accepted in courts of law. This is not yet true for molecular photofitting techniques, and the contentious issue of predicting phenotypes based on ethnic group stereotypes has social connotations beyond forensic analysis. The discussion of the first case reports utilizing DNA for physical profiling shows the need for high levels of accountability with this breakthrough science, testing the limits of understanding of the lay public, police force, and judiciary alike. As a new form of evidence, the DNA-witness based on accumulating databases of AIMS profiled individuals must be compared with the accepted but not necessarily reliable eye-witness testimony of appearance. The final chapter on the politics and ethics of testing for genetic ancestry, as described in this textbook, is challenging and confrontational to our beliefs about what the idea of race and our own racism represents, and, as such, deserves to be read and considered by a wide audience. Molecular Photofitting is a very thoughtful and rigorous treatise on a socially contentious issue, but one that is very likely to help contribute to the policing of our communities.

    Preface

    This textbook is meant to serve less as an instructive tool for the classroom and more as a reference for the forensic, clinical, and academic scientist. It is my hope that scientists seeking to develop or use methods for the inference of phenotype from DNA will find some of the ideas presented here useful.

    Most of the book focuses on data, results, and observations derived from a small number of Ancestry Informative Marker (AIM) panels, which cynics might point out are the same panels my company (DNAPrint Genomics, Inc., Sarasota, FL) sells to the forensics community. This is no accident, and to a point the cynics are right. I founded DNAPrint and invested a substantial portion of its future into developing the data, databases, and tools described herein. I did this because it had not been done before, and believed there existed a corresponding niche that needed to be filled. The fact is that before this text, to my knowledge, there have been no other descriptions of bona-fide AIM panels—describing their expected accuracy and bias, results across populations, comparison of the results with self-held notions, or social constructs and correlation with elements of physical appearance and so forth. Indeed, the AIM panels developed by my colleagues and me at DNAPrint, including Mark Shriver of Penn State University, is among the first ever to be studied in these ways. The statistical methods for estimating individual ancestry, based on what Fisher referred to in 1912 as inverse probability, were worked out for the first time only in the 1970s. Bayesian methods in the field of phylogenetics and individual ancestry estimation are even more recent, having been developed only within the past 10 years. It has been possible to construct panels of good AIMs—not those found in genes, but bonafide AIMs of neutral evolutionary character—for the first time only in the past decade, as the sequence for the human genome has been released.

    No multifactorial human phenotype has ever before been predicted from an appreciation of polymorphisms in human DNA. My colleagues and I were simply among the first to invest in these types of molecular resources and apply them for the purposes described in this book. As the field of molecular photofitting is completely new, having become enabled by the recent completion of the human genome project, it is just a matter of time before other panels and improved methods will be described. Perhaps future editions can incorporate a more diverse collection of panels and databases.

    I have done my best to provide a good theoretical background necessary to appreciate the methods and data discussed. I learned much of the theoretical material in the book (see Chapter 2) during my years in the field from such textbooks as The Genetics of Human Populations, by L.L. Cavalli-Sforza, and W.F. Bodmer; The Handbook of Statistical Genetics, by D.J. Balding, M. Bishop, and C. Cannings; and the various publications of R. Chakraborty. For the student, the theoretical material discussed in this book may serve as stimulation for further reading from a proper population genetics textbook and papers. For the forensics professional, the lay-description provided for these complex ideas may help in better understanding how the machinery of molecular photofitting works, and possibly obviate the need to go to a proper population genetics textbook (which to a nongeneticist may not be a pleasant experience).

    Parts of the book may seem redundant—for example in a later chapter, you might notice an explanation for an idea that was treated fully in an earlier chapter. When dealing with a new topic that requires background information the approach taken here is to recap the background information learned in an earlier chapter rather than to refer you to that earlier chapter. This also serves many readers better—rather than reading the text from cover to cover, most will probably scan quickly for sections of interest, and if the background information was not recapped in that section, the point could be lost.

    There are likely to be mistakes discovered in the first edition, which will be corrected in the next edition. We (my colleagues and I)would be grateful for your feedback. As with most books, this one is intended to stimulate thought, discussion, and ultimately activity in the field. It may be that 20 years from now, the field will have advanced very little due to funding limitations and various sources of resistance. On the other hand it may be that 20 years from now molecular photofitting will be standard practice, and we will be doing amazing things, such as using computers to provide most of the information on a person’s drivers license from DNA left at a crime scene, even creating artist’s renderings from DNA eyewitness testimony. Either way, we are honored to have the opportunity to share our work and interpretation of other relevant works with you, and we hope you find this book useful.

    Acknowledgments

    A very significant THANKS is given to Mark Shriver, who helped write Chapter 1, provided assistance writing the parts of Chapter 6 that covered his work, and edited the first four chapters. It was in collaboration with Mark that the panel of Ancestry Informative Markers (AIMs) discussed in this book were developed, and he was involved in the collection of many of the samples we used. In addition, he and his colleagues wrote the very first version of the MLE program we used, and we later optimized and expanded this program with his help. Much data from his papers appear within these pages and the book would not have been possible without his efforts.

    Matt Thomas managed the laboratory that produced most of the data described in this book, and helped develop many of the ideas, algorithms, and figures that are presented. Without his tireless work over the past six years, most of the data discussed within these pages would not have existed and this book would not have been possible. Zach Gaskin, Shannon Boyd, and Sarah Barrow produced most of the genotype data for the 71AIM and 171 AIM panels and iris color work discussed herein. It was through collaboration with Nick Martin and colleagues at the Queensland Medical Research Institute in Brisbane, Australia that the hair color data and discussion was possible.

    Lastly, the most significant THANKS needs to be given to the investors of the company that funded much of the research in this book—DNAPrint Genomics, Inc. Even after countless rejected grant applications, average, every-day citizens invested in the commercial viability of the concepts we describe by buying DNAPrint stock. Although the value of DNAPrint continues to sag, and products based on the ideas and data presented here have not yet sold well in the forensic, academic, or clinical world, the field is a very new one and these investors need only look at the outcome of the Louisiana Multiagency Homicide Task Force Investigation to know that their investment has made a positive difference in the world. This book is partial evidence of the value of their investment, and I thank them for making this work possible.

    Chapter 1

    Forensic DNA Analysis: From Modest Beginnings to Molecular Photofitting, Genics, Genetics, Genomics, and the Pertinent Population Genetics Principles

    With an Introduction by, Mark D. Shriver

    PART I: INTRODUCTION: BRIEF HISTORY OF DNA IN FORENSIC SCIENCES

    The forensic analysis of DNA is one of the clear successes resulting from our rapidly increasing understanding of human genetics. Perhaps much of this success is because this particular application of the molecular genetic revolution is ultimately pragmatic and because the genetic information required for efforts such as the Combined DNA Index System (CODIS) and The Innocence Project (www.innocenceproject.org) are relatively simple. Although the requirements of DNA in these instances, namely individualization, are indeed, relatively simple, they are somewhat technical, especially for the reader unfamiliar with molecular methods or population genetics. They nonetheless provide an important framework for the bulk of the material presented in this book. Though they are important for the rest of our discussion in the book, in this chapter, we provide only a brief summary of the standard forensic DNA methods, because these are well documented in other recent texts (Budowle et al. 2000; Butler 2001; Rudin & Inman 2002).

    Modern forensic DNA analysis began with Variable Number of Tandem Repeats (VNTR), or minisatellite techniques. First discovered in 1985 by Sir Alex Jeffreys, these probes, when hybridized to Southern blot membranes (see Box 1-A), produced highly variable banding patterns that are known as DNA fingerprints (Jeffreys et al. 1985). Underlying these complex multibanded patterns are a number of forms (alleles) of genetic loci that simultaneously appear in a given individual. The particular combinations of alleles in a given individual are highly specific, yet each is visible because they share a common DNA sequence motif that is recognized by the multilocus molecular probe through complementary base pairing. These multilocus probes are clearly very individualizing, but problematic when it comes to quantifying results. Some statistics can be calculated on multilocus data, but certain critical calculations cannot be made unless individual-locus genotype data are available. In answer to this need, a series of single-locus VNTR probe systems were developed, and these became standard in U.S. forensic labs from the late 1980s through the early 1990s.

    Box 1-A

    The Southern Blot is named after Edwin Southern, who developed this important first method for the analysis of DNA in 1975. This method takes advantage of several fundamental properties of DNA in order to assay genetic variation, generally called polymorphism. The first step is to isolate high molecular weight DNA, a process known as genomic DNA extraction. Next, the DNA is digested with a restriction enzyme, which makes double-stranded cuts in the DNA at every position where there is a particular base pair sequence. For example, the restriction enzyme, EcoRI, derived from the bacteria, Escherichia coli strain RY13, has the recognition sequence, GAATTC, and will cut the DNA at every position where there is a perfect copy of this sequence. Importantly, sequences that are close to this sequence (e.g., GATTTC) will not be recognized and cut by the enzyme. The restriction digestion functions to reduce the size of the genomic DNA in a systematic fashion, and originally evolved in the bacteria as a defense mechanism as the bacteria’s own genomic complement was protected at these sequences through the action of other enzymes.

    After DNA extraction the DNA is generally a series of large fragments averaging 25,000 to 50,000 bp in length. Because of the immense size and complexity of the genome, the results of a restriction enzyme digestion are a huge mix of fragments from tens of base pairs to tens of thousands of base pairs. When these fragments are separated by size on agarose gels using the process known as electrophoresis, they form a heavy smear. Although it’s hard to tell by looking at these smears since all the fragments are running on top of each other, everyone has basically the same smear since all our DNA sequences are 99.9% identical. Places where the restriction patterns differ because of either changes in the sequence of the restriction sites (e.g., GAATTC → GATTTC) or the amount of DNA between two particular restriction sites are called Restriction Fragment Length Polymorphisms (RFLPs).

    The key advancement of the Southern Blot was to facilitate the dissection of these restriction enzyme smears through the ability of DNA to denature (become single-stranded) and renature (go back to the double-stranded configuration), and to do so in a sequence-specific fashion such that only DNA fragments that have complementary sequences will hybridize or renature. The DNA in the gel is denatured using a highly basic solution and then transferred by capillary action, using stacks of paper towels onto a thin membrane, usually charged nylon. After binding the digested DNA permanently to the membrane, we can scan it by annealing short fragments of single-stranded DNA, called probes, which are labeled in such a way that we can detect their presence. The probes will anneal with DNA at locations on the genomic smear to which they have complete, or near complete complementarity depending on the stringency of the hybridization and wash conditions. Since the probes are radiolabeled or chemiluminescently labeled, the result is a banding pattern where the location of particular sequences on the genome emerge as blobs called bands. The lengths of the bands can be estimated as a function of the position to which they migrated on the gel relative to size standards which are run in adjacent lanes.

    The single-locus forensic VNTR systems are highly informative, with each marker having tens to hundreds of alleles. At every locus each person has only two alleles, which together constitute the genotype, one received from the mother and one from the father. Given such a large number of alleles in the population, most genotypes are very rare. A standard analysis with single-locus VNTRs typically included six such single-locus VNTR markers, each run separately on a Southern blot gel. The data from the separate loci would then be combined into a single result expressing one of two outcomes:

    ■ Exclusion—the suspect and evidence samples do not match

    ■ When the genotypes match, a profile or match probability, which is an expression of the likelihood that the two samples matched by chance alone.

    Exclusions are pretty intuitive since the lack of genetic match between the samples eliminates any chance that the suspect could have donated the evidence (baring the very rare occurrences of somatic mutation, chimerism and mosaicism, each cell in our bodies has identical DNA). This of course presumes careful lab procedures and an intact and unquestioned chain of evidentiary custody. Given a match, profile probabilities are also quite intuitive, being expressions of the chances or likelihood that a particular genotype exists in a population. Profile probabilities are essentially a means to express the statistical power of a set of makers to demonstrate exclusion. For example, consider that both the suspect and biological evidence have blood type AB, the least common ABO genotype in most populations. There is no exclusion, but does that mean the suspect left the sample? Since about 4% of people have the AB genotype we say that the profile probability is 0.04 and that given no other information, the chances of having a match by chance alone are 1 in 25. Another way to read this profile probability is to say that 4% of the people match the person who left the sample.

    Maybe these are good betting odds in the casino, but in both science and in court where the destiny of human lives are at stake, more stringent criteria are required. For one thing, the frequency of 4% in the population does not necessarily mean there is a 1 in 25 chance that the suspect donated the evidence. When tests of such limited power were used, other forms of evidence that contribute to the prior probability the suspect donated the evidence would have to be taken into account. Generally, genetic markers are not the only evidence against the defendant and other pieces of information can be combined with the genetic data to comprise a preponderance of evidence. With DNA markers commonly used today, profile probabilities are much smaller than 4%, and thus the weight of the evidence is so great that convictions could be and sometimes are made solely on DNA results, without other evidence or prior probabilities taken into account.

    Single-locus VNTRs were replaced by newer marker systems that became possible as a result of the Polymerase Chain Reaction (PCR), a process of amplifying DNA in vitro, which won a Nobel prize for its inventor, Kary Mullis. These newer markers are most commonly called Short Tandem Repeats (STRs) although they first were referred to as microsatellites since their repeat units are shorter than minisatellites. In many ways STRs are different from VNTRs. For example, STRs generally mutate one or two repeat units at a time and VNTRs mutate in steps of many repeats. There are a number of other differences and similarities in how these markers evolve and how they can be used but these are beyond the scope of this presentation, and interested readers should consult Goldstein and Schlotterer (1999).

    Table 1-1 presents a summary of some of the important characteristics of STRs and VNTRs. In terms of how the markers are used in forensic analyses, STRs are quite similar to VNTRs. The most significant difference is that with VNTRs, a process of allelic binning is required to interpret the genotype. Since the range in allele size at forensic VNTRs is large, alleles at a single locus can take the form of both very small and very large fragments, and so a variety of patterns could comprise a given genotype. Gel electrophoresis methods are limited in the resolution of fragments to about 5% of fragment length and as such, a gel that effectively resolves the smaller alleles at a particular locus will be ineffective at resolving with the greatest precision the largest alleles at this locus.

    Table 1-1

    Comparison between STRs and VNTRs in forensic DNA analysis.

    Southern blot gels are also susceptible to other phenomenon (e.g., band smiling, overloading, and DNA degradation) that compromise the precision of allele size estimation. Protocols therefore were developed to score alleles with the highest levels of size precision possible with the understanding that the estimated size of a particular allele contained some degree of error. Allelic bins became the accepted solution to the question of estimating allele frequencies for profile probabilities and allelic identity (National Research Council 1996; Weir 1996). This is not as sloppy as it might seem, since the nature of the polymorphisms and molecular clock and maximum parsimony theories tell us that alleles of similar size are evolutionarily related (i.e., cousins of one another), but it certainly leaves some precision to be desired. The range in allele size for STRs on the other hand is more restricted and because smaller sizes can be scored on sequencing gels accurate to one base pair size intervals they can be scored without question as to the exact length of alleles.

    Although STR analysis using the PCR is very sensitive, allowing subnanogram amounts of genomic DNA to be analyzed, there are limitations that led to the development of a subcategory of STRs and increased interest in developing other types of markers, like Single Nucleotide Polymorphisms (SNPs). In particular, degraded remains, like those recovered from the site of the World Trade Center disaster, often are composed of highly fragmented DNA. For PCR to work effectively, the DNA templates spanning the region to be amplified must be present in the sample (that is, the templates must be full-length, and not degraded). The standard STR markers are 100 to 450 base pairs in length after amplification and this size has proven to be too large for highly degraded samples, where DNA can be sheared into small fragments. A special panel of smaller mini-STR markers has been defined, which can replace the standard panel for use on highly degraded samples (Butler et al. 2003).

    Given the target for SNPs is only one base pair, these markers are better suited than even the mini-STRs for amplification and analysis in highly degraded samples. SNPs can provide much of the same identity information as STRs, and are a source of information far beyond the simple exclusion and profile probabilities typically gleaned from STR data. This is because most genetic variation among humans is in the form of SNPs, and SNP variation likely underlies most functional variation. That is, SNPs represent the type of polymorphisms one tends to find within or near coding regions of human genes, and since genes cause phenotypes, they are therefore relevant for determining phenotype. In contrast, STRs are of relatively low prevalence and only a few STRs are known to have functional effects, for example, particular STR alleles that have expanded in size result in rare genetic diseases such as Huntington’s chorea and Fragile X syndrome. It is notable that we are referring here to the phenotypic variation among individuals that is genetic in origin. Organisms are actually combinations of their environments and their genomes, a point that is important for many traits, but less so for those that are highly genetic.

    Since SNPs are usually biallelic (having two alleles, e.g., C and T) there are only three genotypes (CC, CT, and TT). One situation in which STRs provide information that is difficult to get from SNP data is for mixed samples. Having three-or four-allele patterns at an STR or VNTR is clear evidence for more than one contributor since everybody has only one mother and only one father, and so two alleles are the maximum that should be seen in a sample left by one person. If a sample registers even all the possible alleles at a SNP, it is just a heterozygote (the CT genotype from the previous example) and it is not clear from the genotype alone whether it is a mixture of a CC and TT individual, two CT individuals, or a single CT individual.

    Showing all STR alleles in the form of an allelic ladder is a useful laboratory technique. Although some SNP analysis protocols can quantitatively assess the relative concentrations of the two alleles helping to recognize mixed samples, it is unlikely that SNPs can replace STRs in the case of a mixed sample when human identification is the goal. STRs are simply better in the case of sample mixtures. That being said, there are usually multiple evidentiary samples at a crime scene and it’s unusual for them all to be mixed and sometimes when they are, there are methods for resolving them. For example, in the case of vaginal swab analysis, the male and female DNA fractions can be separated by differential lysis of cells, male sperm cells being hardier than female epithelial cells (Budowle et al. 2000; Gyske & Ivanov 1996).

    THE STATISTICS OF FORENSIC DNA ANALYSES

    There are a few basic statistics that are fundamental to forensic identity analysis. Earlier we introduced the profile probability as an important measure of the meaningfulness of a genotype match. Calculating the profile probability requires that we are first able to ascertain the genotype of a person. An inability to do this is one of the main limitations for multilocus DNA fingerprinting systems. Even though profile matches using DNA fingerprinting probes are visually stunning and quite compelling as evidence of identity (see Figure 1-1), we do not know which of the multitude of bands on a multilocus gel are allelic; that is, which are the two (maternal and paternal) bands that together comprise the genotype.

    Figure 1-1 Examples of DNA fingerprinting profiles as obtained using the Southern blot method. Shown are: A) The first-ever Southern blot, which used a minisatellite probe. This blot was created by Alec Jeffries and colleagues September 10, 1984. The first three lanes are of a child, the child’s mother, and the child’s father, respectively, and the remaining lanes are DNA samples from various other species including baboon, lemur, seal, cow, mouse, rat, frog (of order long since forgotten, labeled ?), and tobacco in the last lane. As the first experiment of its kind, the blot takes a somewhat messy appearance but the DNA fingerprint banding pattern for the child is clearly discernable as a subset of those of the combination of the mother and father. B) The first Southern blot ever used in an immigration case, using two multilocus DNA fingerprinting probes for detecting different minisatellite sequences. In this case, a mother (M) and her three undisputed children (U) are compared with a child of disputed paternity (B). The father’s DNA was unavailable but we can see that each band present for the child B that is not present in the mother M can be found in at least one of the other children. The likelihood that the child of disputed paternity is a child of the mother rather than being unrelated happened to be 5 × 10 ⁸ . Both photographs supplied by Alec Jeffries (2005) .

    Once a genotype is determined, the next step is to determine the expected frequency of the genotype. How many people in the population should we find with the genotype we measured? The expected genotype frequency is dependent on the allele frequencies of the two alleles that make up a particular genotype. The relationship between the allele frequencies and the genotype frequencies was first recognized by Hardy and Weinberg independently in 1908 and so today is called the Hardy-Weinberg Equilibrium (HWE). HWE is the foundation for population genetics and explains how populations continue unchanged in allele frequencies (and thus phenotype) unless acted on by one of the four evolutionary forces (mutation, selection, gene flow (or admixture), and drift). For the AB heterozygote in the previous example, given the genotype is a heterozygote with the frequency of A in the population as 0.1 and the frequency of B as 0.02, we calculate the expected genotype frequency as 2 × 0.1 × 0.2 = 0.04 or 4%. Box 1-B gives more detail on the derivation of the HWE relationship. It is important to recognize that not only can the four evolutionary forces affect HWE, but so can the mating structure and the levels of ancestry stratification in the population. These effects are particularly important for markers of the type that we focus on in much of this book, the Ancestry Informative Markers (AIMs).

    Box 1-B

    The Hardy-Weinberg Equilibrium formula describes the relationship between observed allele frequencies and expected genotype frequencies in a population. If the population is panmictic (i.e., randomly mating with no population substructure) and of infinite size (so that genetic drift is absent) and there is no admixture, mutation, or natural selection at the locus in question, then given allele frequency for the C allele is p and for T allele q,

       (1-1)

    where

       (1-2)

       (1-3)

       (1-4)

    To estimate the frequency of the C allele from the data, we can use allele counting where NCC, NCT, NTT, and NTOT are the counts of the numbers of individuals observed for each of the three genotypes and the combined total number in the sample, respectively.

       (1-5)

       (1-6)

    The χ²-distribution can be used to calculate the significance of deviations from HWE when the numbers of observations per cell exceeds five. There are a number of alternative methods for computing the significance when there are smaller numbers of observations as often is the case with larger numbers of alleles, because some of them are of low frequency and even for large populations of samples, not all alleles are represented.

    In calculating profile probabilities we usually are combining genotype data from a number of loci across the genome into a single summary. If the markers are unlinked (either located on different chromosomes or far apart on the same chromosome), they are statistically independent and so the individual locus profile probabilities can be multiplied together to estimate the multilocus profile probability. However statistical independence of unlinked markers may not always be the case as population structure caused by assortative mating and ancestry stratification can lead to correlations among unlinked markers. Like the deviations from HWE mentioned earlier, which result from population stratification, these unlinked marker correlations exist only for markers that measure ancestry information (i.e., AIMs). Most genetic markers have very little information for ancestry and so this is generally not an important concern.

    In the early 1990s there was substantial debate over the importance of population structure in multiplying across loci to calculate the profile probabilities. The conclusion was that although the effects of stratification were minimal, forensic scientists should err on the conservative side, implementing the ceiling principal in lieu of a straight product rule calculation. The ceiling principal and other methods to adjust for low levels of population stratification are covered in detail in the NRC report (National Research Council 1996).

    An important development in DNA forensics in the United States was the establishment of the Combined DNA Index System (CODIS) database. CODIS is a national database of DNA profiles from convicted felon and evidentiary samples. The vast majority of the genotype information in CODIS is from STR marker systems, although it was designed to allow for other marker systems (VNTRs and mtDNA) as well. The main purpose of CODIS is to allow criminal investigators to search for matches between convicted felons and evidentiary samples from unsolved cases. Another aim of CODIS was to allow investigators to link different crime scenes for which the same DNA had been donated. Concerns of privacy protections and the logistics were carefully considered in the establishment of the CODIS system (see http://www.fbi.gov/hq/lab/codis/ and Inman & Rudin (2002) for more information). CODIS has proven immensely successful in assisting investigations. Quoting from the CODIS web site:

    As of February 2006, CODIS has produced over 30,000 hits assisting in more than 31,700 investigations.

    As of this date there are a total of 3,072,083 DNA profiles in CODIS; 130,877 are evidentiary and 2,941,206 are convicted offenders.

    A downside of the CODIS database is that, since it is composed of only previously convicted felons, most crime scene specimens do not provide a hit or match. It is for primarily these types of samples that molecular methods for photofitting are useful. As we will discuss throughout this book, certain elements of physical appearance can now or in the near future be gleaned from the DNA. The recently completed human genome project and its various and sundry databases and tools have accelerated DNA research to the point that it will be possible to paint a rough portrait of an individual from their DNA. However, it is notable that were there a stage in our societal development where everyone was sampled and included in a national database like CODIS, the types of ancestry and phenotype assays we describe in this book would be of little interest or use to forensic scientists. Everyone who could commit a crime, baring emigration from untyped jurisdictions, would already be in the database and therefore linking a crime scene with DNA to an individual would be trivial. The likelihood of such a scenario coming about in the near term in the United States seems very small and the wisdom questionable. For example, it would seem that with such a database DNA matches may take on even extra significance, to the exclusion of other types of evidence, and many might begin to feel as if they were a simple laboratory slip-up, or a corrupted computer file, or program away from facing an aggravated first-degree murder charge.

    However, even if such a national database were to be established and everyone entering the United States legally could be compelled to provide a cheek swab, the rate of illegal immigration to the United States would have to be very small for there to no longer be any forensic utility of the methods we propose. Apart from forensics there are other important reasons to pursue molecular photofitting research. Even if molecular photofitting tests are no longer needed forensically because of an exhaustive CODIS database, it is clear that molecular photofitting research will facilitate many important advances in our understandings of human evolution and human genetics, and will serve as model systems for developing efficient methods for studying other phenotypes, like complex diseases.

    THE NATURE OF HUMAN GENETIC VARIATION

    In order to effectively present the content of this book, it is important to consider a few of the terms used in discussing genetic variation and the presumptions underlying them. We can say, somewhat paradoxically, that as more genetic markers are combined into an analysis, the results become less and less genetic. This is true whether we are meaning that the analysis in question is being done by a scientifically minded observer or by the cells of a developing organism. Genetic is the process of inheritance first described by Gregor Mendel, who observed that characters were the result of the segregation of factors, one coming from each parent (Mendel 1865). Today we call these factors alleles, and know that they are different forms of genes. Such different forms of a gene are ultimately the result of variability in the DNA sequence comprising the gene, sometimes affecting the sequence of amino acids for the protein encoded by the gene, and sometimes the ways in which the genes are transcribed into mRNA and then translated into protein. These variations have distinct effects on the phenotype (physical character or trait), some as described by Mendel being either recessive or dominant effects.

    Many traits are like this; the genetic paradigm has been used with tremendous success over the past 100 years. As of April 2006, the molecular basis for 2256 phenotypes transmitted in a monogenic (single gene) Mendelian fashion have been described (Online Mendelian Inheritance in Man database, OMIM: http://www.ncbi.nlm.nih.gov/Omim/mimstats.html). All these traits have distinct genetic effects, meaning they can be observed to segregate in families consistently enough to be mapped onto specific chromosome regions and then the particular genes in those regions identified. Most of these traits that have been mapped to date give rise to disease. It is often the case, however, that not everyone with the risk-allele is affected with the disease; this phenomenon is known as incomplete penetrance. Incomplete penetrance is sometimes the result of modifying loci that interact with the risk-allele, altering either the chances of showing the disease or its severity (formally a related concept called variable expressivity). Despite these caveats, genetic traits are those that are inherited in families and show distinct and dichotomous phenotypes (e.g., normal and affected).

    However, variability in many traits is not the result of one gene (with or without modifying genes), but is instead the result of variation in a number of genes. These traits are called polygenic and do not segregate by Mendelian rules, but instead appear to undergo mixing. Charles Darwin, Francis Galton, and others described this as blending inheritance. Of course, genes still are underlying phenotypic variability in polygenetic traits (e.g., skin pigmentation and stature), and it is the combined effect of these multiple genes that results in the continuous nature of trait variation. Thus, polygenic traits are properly genic, the result of gene action, but not genetic in the formal sense of being inherited simply. This distinction may seem merely semantic, but there are important conceptual implications inherent in these terms and these distinctions are important for the population genomic paradigm on which modern studies of genetic variation and evolution are founded.

    POPULATION GENETICS AND POPULATION GENOMICS

    The differences between genics, genetics, and genomics are key points in the new paradigm of population genomics. Population genetics is different from Mendelian genetics in that it is primarily concerned with the behavior of genetic markers and trait-affecting alleles in populations, not in families. Population genetics is, fundamentally, the study of evolutionary genetics that arose out of the new synthesis in which evolutionary theory was reinterpreted in the light of the rediscovery of Mendel’s work in 1900. Mendelian inheritance is thus part of the foundation of population genetics, akin to an axiom in mathematics, but rarely the object of direct attention. Human population genetics research generally focuses on estimating population parameters that help us model the demographic histories of populations. Questions such as which populations are most closely related, whether particular migrations occurred over short or long periods of time, how admixture has affected contemporary populations, when and how severe were population bottlenecks (or reductions in the size of the breeding population) in the past, are the focus of these efforts.

    Much human population genetic research has been carried out using mitochondrial DNA (mtDNA) and nonrecombining Y-chromosomal (NRY) markers. Since the genes contained in mtDNA and the NRY are present in a single (unpaired) form in each cell of the body, they are passed on intact from parent to offspring without undergoing recombination. Thus, all variants are physically connected forever, regardless of the number of base pairs separating them. Since the mtDNA and NRY are very large and highly variable, they are remarkably informative regarding human demographic history. Additionally, since the mtDNA is always inherited through the maternal lineage and the NRY through the paternal lineage, these markers can be used to test hypotheses about differences between female and male migratory patterns and other aspects of mate pairing (Jobling et al. 2004). But despite being very informative, mtDNA and NRY studies do not provide more than one measured locus and at least in individuals, there is no way to calculate confidence intervals about the point estimates of the parameters that are being studied (such as estimates of ancestry, or likelihood to express a disease phenotype). Since there is no recombination within the mtDNA and NRY, and since these contain a number of genes, adaptation in any of the genes will affect all the genetic variation and an assumption of selectively neutral evolution will not hold.

    Another branch of human population genetics has focused on averaging together as many unlinked genetic markers as possible to estimate population parameters. Unlinked genetic markers are found on the autosomal chromosomes (chromosomes 1–22) and the X chromosome, and conclusions are drawn considering collections of markers together. The same types of questions that are investigated with mtDNA and NRY markers are addressed with many-marker averages but one procedural difference between the two methods is that for mtDNA and NRY studies, many more individuals are needed in the population samples compared to studies of large numbers of independently segregating markers. For example, consider two population genetics papers appearing in Science a few years ago: the paper by Ke and colleagues (2001) focused on NRY analysis in 12,000 Asian men, and the paper by Rosenberg and colleagues (2001) used average population samples of 20 individuals typed for 377 microsatellite markers (a paper that we will discuss in more detail later). Both of these papers were able to draw interesting conclusions, one using a single genetic marker system but very many persons, the other using many markers, but fewer people.

    Population genomics is a new branch of population genetics, which is specifically focused on a consideration of both the averages across many markers as well as specific loci individually. A concise summary of this new perspective on genetic variation has been described as, the process of simultaneously sampling numerous variable loci within a genome and the inference of locus-specific effects from the sample distributions. In other words, population genomics incorporates a model of genome evolution that allows for the analysis of the unique, locus-specific patterns. These patterns result from genetic adaptation in the context of the genomewide averages, and are represented largely by selectively neural loci affected primarily by demographic history. In essence, population genomics focuses stereoscopically on both the forest and the trees; we are interested simultaneously in the evolution of both individual genes and the populations carrying them.

    Briefly, there are four primary evolutionary forces: mutation, genetic drift, admixture, and selection. Mutation (i.e., a change in the sequence, number, or position of nucleic acids along the strand of DNA) is the ultimate source of all genetic variation and occurs more or less at random across the genome. Mutation events are unique, occurring in just one chromosome at one particular genomic location and at one point in time, although some nucleotide positions are hypermutable and undergo changes repeatedly. Therefore, although each mutation is unique, each variable position may have had multiple origins. SNPs, for example, are particular nucleotide positions now variable as the result of a base-pair substitution in the recent past that have spread to become prevalent enough that we recognize them as variants in a reasonable sample of the modern-day population. STRs change in repeat number and thus allele size largely through the process of slipped-strand mispairing, a shift of the template and nascent strands during DNA replication. Other types of mutation include unequal recombination, thought to be a primary force in changing VNTR repeat lengths and in causing some insertion deletion polymorphisms and retrotransposition, the process by which Short Interspersed Elements (SINEs) like Alus get copied and then inserted into new genomic positions. Mutation is important as an evolutionary force over the long term, such as when making comparisons among species. However, in terms of variation within a species, we are generally more concerned with the frequencies of alleles that already exist, and so the other three evolutionary forces are more important.

    Genetic drift occurs as the result of segregation of parental alleles in the generation of the current population cohort. In a population of infinite size, there is no genetic drift; that is, all chromosomes present in the parental generation are transmitted to the next. But in the quite finite population sizes in which actual persons and all other organisms live, the magnitude of the effect of drift on the genetic variation in the population is inversely proportional to the size of the mating population, increasing as population size decreases. Over time, genetic drift leads both to the loss of some alleles and fixation (i.e., a particular variant becomes the only allele in a population) of others, and to genetic differentiation among populations that have become reproductively isolated (i.e., populations drift apart after separation). Both the time since the separation and the sizes of the daughter populations determine the levels of differentiation. As we are most interested in real populations, we also need to recognize that populations are rarely ever completely isolated.

    Admixture is the process of gene flow or gene migration of alleles from one population to another that occurs when individuals from reproductively isolated populations interbreed. There is a range of possible scenarios by which admixture could happen. The two extreme models of admixture are Hybrid Isolation (HI), where admixture happens in one generation resulting in the formation of a new population that is then isolated from both of the parental populations, and the Continuous Gene Flow (CGF) model, where admixture happens slowly over a long period of time with one population (or both) exchanging migrants (see Figure 1-2). Admixture counteracts the effects of genetic drift and differential selection, making populations more similar. Admixture also creates Admixture Linkage Disequilibrium (ALD), which can be used for mapping the genes determining variation in traits that distinguish populations. Linkage Disequilibrium (LD) is the nonrandom association of alleles at different loci and an important source of statistical power for genetic association testing. This is because we typically cannot survey every base of DNA for samples as part of a research project and connections between points along the DNA enable us to identify regions and genes of interest through surveys of markers along the chromosomes (like milepost signs on a highway). Additionally, admixture creates a useful type of genetic structure within populations that might best be called Admixture Stratification (AS).

    Figure 1-2 Shown are A) Hybrid Isolation (HI) and B) Continuous Gene Flow (CGF) models of admixture. The equations indicate the amount of Admixture Linkage Disequilibrium that is formed by each of these models. Abbreviations: t = generation; θ = recombination fraction between loci; D 0 = amount of LD present in the admixed population immediately after admixture; δ A , δ B = allele frequency differentials between parental populations A and B, respectively; α = the contribution of population 2 in each generation; 1 – α = the contribution of the admixed population in each generation; δ A , t and δ B , t = the allele frequency differentials at generation t; D t-1 = the amount of LD in the previous generation. From Long, Figure 1, Genetics, 1991, vol.127, p.198–207. Reprinted with permissiong.

    Admixture stratification can be defined as variation in individual ancestry within an admixed population such that, in the simplest case of a dihybrid population—one with two parental populations—some members of the population have more genomic ancestry from one parental population and others have more genomic ancestry from the other parental population. AS has been shown to be an important source of information in testing the extent to which variation in a trait across populations is due to genetic differences between the parental populations. Because there is extensive AS in most admixed populations it is possible to do experiments that test for correlations between phenotypes (common traits or disease traits) and estimates of genomic ancestry (see Chapter 5).

    Natural selection is the process of allele frequency change due to differential fitness (survival and reproduction) of individuals of different genotype. Natural selection contrasts with the effects of genetic drift and admixture insofar as only the genomic region around the gene that is under selection is changing, whereas genetic drift and admixture both affect markers across the whole genome. It is notable that, given the random nature of the effects of genetic drift, a substantial amount of variation in the level of divergence is expected even though the evolutionary force of drift is applied evenly to all markers across the genome. In other words, drift alone will lead to a very wide range in outcomes across the genome because of the combined effects of many random events. This wide variance means that alleles that are very different in frequency between populations are not necessarily this way because of the action of natural or sexual selection. Drift will cause some neutral alleles to have changed in frequency across populations, but for the most part, alleles in the far tail of allele frequency distributions, the AIMs, may well be enriched for the action of selection.

    Natural selection, which makes populations different at particular genes, often is referred to as directional selection or positive selection. Not all forms of natural selection lead to high levels of allele frequency differentiation among populations. Another major form of natural selection, which does the opposite, is balancing selection. Balancing selection is also called overdominant selection because it acts to favor the heterozygote. If each population is subject to balancing selection, the alleles will remain at similar frequencies over very long periods of time, making for lower levels of genetic differentiation than would be expected if genetic drift only were acting. By considering this range of evolutionary forces in both their global and regional (geographic and genomic) effects, we can use the natural human biodiversity to explore our evolutionary histories and physiologies in a way that has simply not been possible before. For instance, we can now address questions not only regarding whether this particular mountain dwelling population is related to the populations on one side of the mountain

    Enjoying the preview?
    Page 1 of 1