Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
E STAND AT THE THRESHOLD OF A NEW CENTURY with the whole human genome stretched out before us. It is, at once, a public and intensely private record. Written in each persons DNA is a shared history of the evolution of our species and a personal portent of both the health and disabilities we may encounter as individuals. High-throughput analyses using bioinformatic tools and DNA arrays are uncovering an exponentially increasing number of genes implicated in human diseases. Each gene identified in a known disease pathway automatically becomes a validated target for therapeutic development. Pharmaceutical companies are racing to convert this new knowledge into blockbuster drugs while the medical profession ponders how it will keep up with the flood of new diagnostic and treatment options. The implications extend far beyond medicine, as scientists seek genes behind the behaviorsand misbehaviorsthat make us uniquely human. The border between science fact and fiction will blur as we move further into the genome age. A tenfold increase in the information capacity of photolithographic DNA arrays could condense the entire human genome onto a chip the size of a postage stamp. Perhaps not everyone will be able to afford to carry a complete copy of their genome in a medical alert locket, but rapid scans of hundreds of medically and behaviorally important genes will almost certainly become part of standard medical care in developed countries. What will it be like when we have a precise catalog of all the good, bad, and middling genesand the wherewithal to determine who has which? On a personal level, will a genome-wide scan take on the aspect of genetic tarot, predicting the future course of our lives? In the face of such knowledge, will society continue to acquiesce to those who prefer to let nature take its course or will we gravitate toward a prescribed definition of the right genetic stuff? This is not the first time that we have asked such questions. At the turn of the last century, science and society faced a similar rush to understand and exploit human genes. Eugenics was the name of the effort to apply principles of Mendelian genetics to improve the human species. The eugenics movement began benignly in England with positive efforts by families to improve their own heredity. It took a negative turn in the United States, as well as in Scandinavia, where flawed data became the basis for laws to sterilize individuals and restrict immigration by ethnic groups deemed unfit. These misguided attempts at eugenic social engineering formed part of the basis
259
260
CHAPTER 8
Harry Laughlin and Charles Davenport Outside the Eugenics Record Office, 1912
(Courtesy of the Harry H. Laughlin Archives, Truman State University; http://www.eugenicsarchive.org.)
261
of the Nazi final solution to achieve racial purity, which resulted in the murder of more than 10 million Jews, gypsies, and other groups considered unfit. We will thus begin the story of human genetics with the cautionary tale of its conjoined birth with eugenics, focusing on its development in the United States. Partly due to the stigma of association with the Holocaust and partly due the difficulty of conducting rigorous experimental studies, human genetics languished as a sort of scientific backwater after World War II, until molecular genetics provided new methods to track gene inheritance. After discussing the applications of genetics to the practical problems of human disease and identity, we will link these to the emerging story of how humans evolved and populated the earth.
Francis Galton
(Copyright The Galton Collection, University College London; http://www.eugenicsarchive.org.)
262
CHAPTER 8
In 1910, Davenport obtained funding to establish a Eugenics Record Office (ERO) on property adjacent to the Station for Experimental Evolution. A series of ERO bulletins, including Davenports Trait Book and How to Make a Eugenical Family Study, helped to standardize methods and nomenclature for constructing pedigrees to track traits through successive generations. Constructing a pedigree entailed three important elements: (1) finding extended families that express the trait under study, (2) scoring each family member for the presence or absence of the trait, and (3) then attempting to discern one of three basic modes of Mendelian inheritance: dominant, recessive, or sex-limited (X-linked). Eugenicists fared well on the first element, because large families were common in the first decades of the 20th century. However, scoring traits was a difficult problem, especially when eugenicists attempted to measure complex traits (such as intelligence or musical ability) and mental illnesses (such as schizophrenia or manic depression). In general, eugenicists were lax in defining the criteria for measuring many of the traits they studied. This led them to conclude that many real and imagined traitsincluding alcoholism, feeblemindedness, pauperism, social dependency, shiftlessness, nomadism, and lack of moral controlwere single-gene defects inherited in a simple Mendelian fashion. Much eugenical information was submitted voluntarily on questionnaires. Some families were proud to make known their pedigrees of intellectual or artistic achievement, whereas others sought advice on the eugenical fitness of proposed marriages. The circus performers on midways of nearby Coney Island offered eugenics researchers a trove of unusual physical trait differences, including giantism, dwarfism, polydactyly, and hypertrichosis. Notably, Davenports correspondence with an albino circus family resulted in the first Mendelian study of albinism, published in the Journal of Human Heredity. In addition to interviewing living family members, eugenics workers also used data from insane asylums, prisons, orphanages, and homes for the blind. Surveys filled out by superintendents were used to calculate the ethnic makeup of societal dependents and the costs of maintaining them in public institutions. With the mobilization for World War I, tens of thousands of men inducted for the draft provided a ready source of anthropometric and intelligence data. Notably, the Army Alpha and Beta Intelligence Tests, developed by Robert Yerkes of Harvard University, supposedly measured the innate intelligence of army recruits. African-American and foreign-born recruits were much more likely to do poorly on the Yerkes tests, because they mostly measured knowledge of white American culture and language.
263
Richard Dugdale, of the Executive Committee of the New York Prison Association, brought the concept of degenerate inheritance to eugenics in The Jukes (1877), a pedigree study of a clan of 700 petty criminals, prostitutes, and paupers living in the Hudson River Valley north of New York City. Dugdale held the Lamarckian view that the environment induces heritable changes in human traits. He compassionately concluded the Jukes situation could be corrected by providing them improved living conditions, schools, and job opportunities. However, this interpretation was discredited by American eugenicists, who embraced Mendels genetics and Weismanns theory of the germ plasm. Together, these formed an interpretation that human traits are determined by genes which are passed from generation to generation without any interaction with the environment. Thus, when the EROs field worker Arthur Estabrook reevaluated the Jukes in 1915, he found continued degeneration and placed the blame squarely on bad genes and the people who carried them. Davenports study of naval officers amusingly illustrates the extent to which eugenicists sought genetic explanations of human behavior to the exclusion of environmental influences. After analyzing the pedigrees of notable seamen including Admiral Lord Nelson, John Paul Jones, and David Farragut Davenport concluded that they shared several heritable traits. Among these was thalassophilia, love of the sea, which he determined was a sex limited trait, because it was found only in men. Davenport failed to consider the equally likely explanations that sons of naval officers often grew up in environments dominated by boats and tales of the sea or that women were prohibited from seafaring occupations throughout the 19th and early 20th centuries.
264
CHAPTER 8
American eugenicists were overwhelmingly white, of northern and western European extraction, and members of the educated middle and upper classes. They looked with disdain on the new immigrants, many of whom settled in lower Manhattan. The plight of many of these immigrantspacked into tenements, plagued by tuberculosis and crime, and reduced to virtual serfdom in sweat shopswas sympathetically chronicled by Jacob Riis and the muckraking journalists. But to eugenicists, the immigrants lot had little to do with poverty or lack of opportunity and had everything to do with their bad genes, which eugenicists feared would quickly pollute the national germ plasm. The eugenics movement provided a scientific rationale for growing anti-immigration sentiments in American society. Labor organizations fed on fears that working class Americans would be displaced from their jobs by an oversupply of cheap immigrant labor, while anti-Communist factions stirred up fears of the red tide entering the United States from Russia and eastern Europe. As expert agent for the Committee on Immigration and Naturalization of the U.S. House of Representatives, ERO Superintendent Harry Laughlin became the anti-immigration movements most persuasive lobbyist in the early 1920s. During three separate testimonies, he presented data that purported to show that southern and eastern European countries were exporting genetic defectives to the United States who had disproportionately high rates of mental illness, crime, and social dependency. The resulting Immigration Restriction Act of 1924 cut immigration to 165,000 per year and restricted immigrants from each country according to their proportion in the U.S. population in 1890a time prior to the major waves of immigration from southern and eastern Europe. This had the desired effect of reducing southern and eastern European immigrants to less than 15,000 per year. Immigration did not regain prerestriction levels again until the late 1980s.
265
Of all the legislation enacted during the first four decades of the 20th century, sterilization laws adopted by 30 states most clearly bear the stamp of the eugenics lobby. Although the earliest sterilization law, passed in Indiana in 1907, was aimed at convicts and sex offenders, feebleminded persons became the major targets for eugenic sterilization. This owed much to Henry Goddards influential book, The Kallikaks (1912). This effectively related study of the descendents of Martin Kallikak (the name is fictitious) was, in effect, a controlled experiment in positive and negative eugenics. Martins marriage to a normal woman produced a normal lineage (from the Latin kallos for goodness and beauty). However, as a young militiaman in the Revolutionary War, he had an elicit union with an attractive but feebleminded barmaid, producing a second, bad lineage (kakos, for bad). Thus, the primary intent of eugenic sterilization was to curb the supposed promiscuous tendencies of the feebleminded, who threatened to perpetuate their kind and to contaminate good lineages, as surely as the case of Martin Kallikak. Many of the early sterilization laws were legally flawed and did not meet the challenge of state court tests. To address this problem, Laughlin designed a model eugenics law that was reviewed by legal experts. Virginias use of the model law was tested in Buck v. Bell, heard before the Supreme Court in 1927. Oliver Wendell Holmes, Jr., delivered the Courts decision upholding the legality of eugenic sterilization, which included the infamous phrase, Three generations of imbeciles are enough! Carrie Buck, the subject of the case, had given birth to an illegitimate daughter and been institutionalized in the Virginia Colony for the Epileptic and the Feebleminded. Carrie was judged to be feebleminded and promiscuous. Arthur Estabrook examined Carries infant daughter Vivian and found her not
266
CHAPTER 8
quite normal. It is impossible to judge whether Carrie was feebleminded by the standards of her time, but the child that Carrie bore out of wedlock was the result of her rape by the nephew of her foster parents. Clearly, Vivian was no imbecile. Later scholarship turned up Vivians first-grade report, showing that she was a solid B student and received an A in deportment. Carrie was the first person sterilized under Virginias law. Buck v. Bell was never overturned, and sterilization of the mentally ill continued into the 1970s, by which time about 60,000 Americans had been sterilizedmost without their consent or the consent of a legal guardian.
267
Wilhelm Weinberg
(Reprinted, with permission, from Stern C. 1962. Wilhelm Weinberg, 18621937. Genetics 47: 15; (Genetics Society of America.)
Godfrey Hardy
(Courtesy of Trinity College, Cambridge, England.)
268
CHAPTER 8
order, presented a particular quandary. Although geneticists almost universally agreed that the feebleminded should be prevented from breeding, the HardyWeinberg equation showed that sterilization of affected individuals would never appreciably reduce the incidence of the disorder. Only a hideously massive program of sterilizing the vast reservoir of heterozygous carriers predicted by the equation would have any hope of significantly reducing the incidence of mental illness. Despite this, feeblemindedness was thought to be so rampant that many geneticists believed reproductive control could still prevent the birth of tens of thousands of affected individuals per generation. Although he was a founding member of the board of the ERO, Thomas Hunt Morgan resigned after several years. He criticized the movement in the 1925 edition of his popular textbook, Genetics and Evolution, warning against the wholesale application of genetics to mental traits, and against comparing whole races as superior or inferior. He offered this advice: ...until we know how much the environment is responsible for, I am inclined to think that the student of human heredity will do well to recommend more enlightenment on the social causes of deficiencies...in the present deplorable state of our ignorance as to the causes of mental differences. In 1928, Johns Hopkins geneticist Raymond Pearl charged that most eugenics preaching was contrary to the best established facts of genetical science. A visiting committee of the Carnegie Institution in 1935 concluded that the body of work collected at the ERO was without scientific merit and recommended that it end its sponsorship of programs in sterilization, race betterment, and immigration restriction. Thus, the negative emphasis of American eugenics was completely discredited among scientists by the mid 1930s. Growing public knowledge of Germanys radical program of race hygiene led to a wholesale abandonment of popular eugenics. The ERO was closed in December 1939. In the meantime, eugenics was gathering steam in Germany. Laughlins model sterilization law was the basis for Nazis own law in 1933, and his contributions to German eugenics were recognized by an honorary degree from the University of Heidelberg in 1936. Over the next several years, some 400,000 peoplemainly in mental institutionswere sterilized. In 1939, euthanasia replaced sterilization as a solution for mental illness, and the lives of nearly 100,000 patients were ended mercifully with lethal gas. Overt euthanasia of mental patients ceased in 1941, when physicians with experience in euthanasia were reassigned to concentration camps in Poland, where they were needed to apply the final solution for Nazi racial purity.
269
simplified by the flys rapid generation time and many offspring per generation. Furthermore, the lineage of each individual is known at the outset of a breeding experiment. Members of the experimental pool are most often physically and genetically identical, differing from one another only by one or, at most, several traits. This genetic homogeneity allows a specific trait or mutation to be observed against an essentially neutral background. Human genetics differs from classical genetics in that the system under study cannot be easily manipulated. Although arranged marriages still take place in some cultures, people, for the most part, choose their own spouses and are generally opposed to being selectively mated. Thus, human geneticists must be content to work with the existing genetic makeup of related family members. In addition, they seldom have the luxury of following a single well-defined trait through successive generations; rather, they often must deal with a perplexing syndrome of variable traits. Most human populations tend to be outbred, meaning that they are physically and genetically heterogeneous. Thus, it is more difficult to identify genesespecially those with variable phenotypesagainst this heterogeneous background. Certain human populations, however, whose members are genetically isolated by geography or customs have a degree of genetic homogeneity. The relatively closed gene pools of the Icelandic people, the Old Order Amish, and the Mormons, combined with their habit of keeping meticulous genealogical records and having relatively large families, have made them amenable to genetic analysis. Customs prohibiting alcohol consumption among the Amish and Mormons make easier the analysis of mental and behavioral disorders, such as manic depression and schizophrenia, whose symptoms may be masked by alcohol or drug abuse. The problems of human genetics were only solved as time eroded memories of Nazi eugenics, and when restriction enzymes and polymerase chain reaction (PCR) provided markers whose presence or absence can be scored with great certainty. It is worth remembering that during the entire reign of eugenics, DNA had not yet been shown to be the molecule of heredity, and nothing was known about the physical basis of mutation and gene variation. With an understanding of gene variation has come a deeper understanding of disease complexity. Twin studies strongly indicate that genes have a dominant role in all aspects of human health and behavior. However, just as Hermann Muller observed in fruit flies, human diseases do not always exhibit a simple gene-to-character relation. An identical mutation may produce different physical symptoms (phenotypes) in different people. Conversely, different mutations may produce similar phenotypes in different people. As George W. Beadle and Edward Tatum found in Neurospora, mutations in any of several enzymes can have the same end effectof altering or knocking out a biochemical pathway. At one end of the spectrum are simple, genetically homogeneous diseases, such as sickle cell anemia and cystic fibrosis, for which affected individuals share common mutations and highly similar symptoms. In the middle are diseases, such as -thalassemia and neurofibromatosis 1, in which a variety of types of mutations in a single gene produce variable symptoms. At the other end of spectrum are complex, genetically heterogeneous diseases, such as asthma and bipolar disorder, in which mutations in a number of genesin combination with environmental factorslikely account for extremely variable symptoms.
270
CHAPTER 8
271
MstII cuts the normal -globin gene at three sites (1,2,3), producing two restriction fragments of 1150 and 200 bp 1 2 3
200 BP
CCTNAGG
CCTGAGG CCTNAGG
Radioactively labeled probe spans region of sickle cell mutation, from site 1 to 2
The sickle cell mutation results in loss of MstII site 2. MstII cuts the mutated -globlin gene only at sites 1 and 3, producing a single larger restriction fragment of 1350 bp 1 3
CCTNAGG
ed ct ed er ent rier ent ffe ld ct us ri ar Par ar Par na Chi ffe Fet C C U A
CCTGTGG
CCTNAGG
Southern Blot
272
CHAPTER 8
amino acid). In 1956, Vernon Ingram and John Hunt independently sequenced the Hb and Hbs proteins, finding that a glutamic acid at position 6 in Hb is replaced with valine in Hbs. From this information, they used a genetic code table (showing that glutamic acid = GAG and valine = GTG) to predict that the A-T point mutation in the sixth codon is responsible for sickle cell disease. The availability of protein and predicted DNA sequence facilitated the cloning of the - and -globin genes from a human genomic library in the early 1980s. (The methods used by Philip Leder to clone the -globin gene are discussed in Chapter 5.) The combination of Southern blot and restriction enzyme analysis made DNA diagnoses possible for the causative lesions of many hemoglobinopathies. In constructing early restriction maps of cloned human DNA, it became obvious that a point mutation can change a restriction enzyme recognition site, producing different-sized fragments, termed a restriction-fragment-length polymorphism (RFLP). These were the first DNA polymorphisms (poly for many and morph for form) that could be readily detected. As discussed in Chapter 6, RFLPs also were the major type of marker employed in the early physical and linkage maps of the human chromosomes. Used in a local region of a chromosome, an RFLP also might detect the causative mutations of disease. Initially, RFLPs were detected by Southern blot analysis, using a radioactive probe that hybridizes to the polymorphic region. The mutation responsible for sickle cell anemia was first detected by RFLP analysis in 1978 by Yuet Wai Kan and Andrea-Marie Dozy at the University of California, San Francisco. They used the restriction enzyme MstII, which recognizes the sequence CCTNAGG (where N equals any nucleotide). The A-T mutation results in the loss of an MstII recognition site that spans the region of sixth codon of the -globin gene. Thus, the DNA from normal homozygous individuals, heterozygous carriers of the sickle cell trait, and homozygous sickle cell patients produces different restriction fragments when cut with MstII.
273
becomes obvious when one considers that patients suffering from these diseases require long-term treatment lasting a minimum of 510 years. The risk of virus contamination is a most important consideration in any therapeutic product purified from mammalian cells. Simian virus 40 (SV40), which has proved so important in cancer research, was first isolated as a contaminant in poliovirus vaccine produced in monkey cells. Although there is no evidence of illness as a result of SV40-contaminated poliovirus vaccines, supplies of both human growth hormone and clotting factors have at one time or another been infected with life-threatening pathogens. Prior to the identification of human immunodeficiency virus type 1 (HIV-1) and the development of virtually foolproof screening procedures, patients with hemophilia had a significant risk of contracting AIDS (acquired immune deficiency syndrome) from transfusion of contaminated clotting factors, as well as whole blood. During the window of time between the onset of the AIDS pandemic and the development of effective methods to screen for HIV and disable it in blood products, in 19831984, is it estimated that half of all hemophiliacs developed AIDS. According to 2001 statistics from the Centers for Disease Control, a total of 5234 American hemophiliacs have died of AIDS. Therapeutic proteins isolated from animals, including porcine or bovine insulin, differ in amino acid makeup from the human protein they replace. The biological activity of an animal substitute may differ slightly from the native human protein, or it may elicit an immune response. Some diabetics had allergic reactions to porcine or bovine insulin, although this may have been due to impurities in the preparations and not necessarily differences in the amino acid sequence. The development of new genetic tools made possible the cloning and production of a number of genes for medically important proteins, including insulin, clotting factors, tissue plasminogen activator, interleukin, interferon, erythropoietin, and colony stimulating factors. The Boyer-Cohen experiment (Chapter 4) showed that recombinant DNA methods can be used to transfer essentially any gene into E. coli, where the encoded protein may be expressed. This established a new paradigm of using cultured cells to produce therapeutic proteins to treat human metabolic disorders. Producing therapeutic proteins from cloned human genes inside Escherichia coli hosts eliminates the risk of virus contamination and allergic sensitivity. Mammalian viruses cannot reproduce inside E. coli and hence cannot be co-isolated with the protein from a bacterial culture. The protein harvested from the bacterial culture has been expressed from a human coding region and is identical (or very nearly so) to the native protein. Thus, diabetics sensitive to bovine or porcine insulin do not have an adverse reaction to human insulin of recombinant origin.
274
CHAPTER 8
Generic name/Company
Humulin/Eli Lilly & Co. Protropin/Genentech, Inc. Intron A/Scherin-Plough
Approved for
Diabetes mellitus Growth hormone deficiency in children Hairy cell leukemia Genital warts Kaposis sarcoma Hepatitis C Hepatitis B Hepatitis B prevention Anemia of chronic renal failure Chronic granulomatous disease Bone marrow transplantation Hemophilia A Cystic fibrosis Christmas Disease Hemophilia B
Recombinant hepatitis B vaccine Epoetin alfa Recombinant interferon- Colony-stimulating factor (CSF) Recombinant anti-hemophiliac factor Recombinant DNase I Recombinant coagulation factor IX
Recombivax HB/Merck & Co. EPOGEN/Amgen Ltd. Acctimune/Genentech, Inc. Leukine/Immunex Corp. Recombinate rAHF/Baxter Healthcare Pulmozyme/Genentech, Inc. AlphaNine SD/Alpha Therapeutic Corp.
The mature insulin molecule consists of two polypeptide chainsan A chain of 30 amino acids and a B chain of 21 amino acidswhich are held together by disulfide linkages. However, this active insulin results from sequential modifications in two precursor molecules: preproinsulin and proinsulin. The gene for insulin consists of two coding exons separated by a single intron. Following splicing of the pre-mRNA, a functional transcript is translated into a large polypeptide called preproinsulin. The molecule includes a 24-amino-acid signal peptide at its amino terminus, a feature of many secreted proteins needed for their proper transport through the cytoplasm. The signal peptide, which is the first part of the preproinsulin molecule produced, anchors the free-floating ribosome to the endoplasmic reticulum (ER) and is subsequently clipped off as the molecule passes through the ER membrane. The result is a molecule of 84 amino acids called proinsulin, whose looped shape is maintained by cross-linking disulfide bonds. Proinsulin makes its way to the Golgi apparatus, where a converting enzyme removes 33 amino acids from the middle of the connecting loop (the C chain), leaving the remaining A and B chains held together by the disulfide linkages. This yields active insulin, which is stored in a secretory granule for eventual release into the bloodstream. The human genomic sequence could not be used directly to produce active insulin in E. coli, because the bacterium lacks the enzyme systems needed (1) to splice out the intron sequence to produce a mature mRNA, (2) to remove the signal sequence from preproinsulin, and (3) to remove the C chain from proin-
275
sulin. Although other methods have been used, the strategy first employed in 1979 by Eli Lilly & Co. to produce recombinant human insulin neatly sidesteps these constraints by simply omitting the above sequences. The nucleotide sequences coding for the A and B chains of active insulin were chemically synthesized and cloned into separate expression plasmids. A bacterial strain containing each expression vector produces a fusion bacterial/human polypeptide, which is harvested and subsequently treated with cyanogen bromide to remove the bacterial amino acids. Cyanogen bromide cleaves the fusion polypeptide at the methionine residue that begins the human sequence. This treatment, which also alters tryptophan, is only useful because, by happenstance, neither the A nor B insulin chain includes tryptophan or additional methionine residues.
Cell Synthesis
Recombinant-DNA Synthesis
A-chain plasmid B-chain plasmid
Golgi Preproinsulin Signal peptidase Tryptophan peptide Signal peptide MET B Chain A Chain A Chain C Chain Proinsulin B Chain Cyanogen bromide MET
Insulin
276
CHAPTER 8
Purified A and B chains are then mixed in equal portions and incubated under conditions that form the disulfide linkages. Human growth hormone, a polypeptide of 191 amino acids, is also produced in recombinant E. coli. The coding sequence for the first 24 amino acids of the expressed gene is synthesized chemically, whereas amino acids 25 through 191 are derived from a cDNA copy of HGH mRNA isolated from pituitary cells. The recombinant HGH differs by one amino acid from normal HGH due to the fact that E. coli is unable to remove the initiator methionine residue that is removed posttranslationally in human cells.
277
University, elucidated how interferon initiates a signal transduction pathway through which immune cells are primed to recognize and degrade the RNA of infecting viruses. However, interferon proved to be relatively toxic at pharmacological doses and generally failed to live up to its wonder drug hype. Although interferon has not proven to be the magic bullet that some had hoped for, its several forms have proven broadly useful in treating a number of diseases. Interferon- is the most effective treatment available for hepatitis B and C, which infect hundreds of millions people worldwide, and is also used in treating leukemias. Interferon- is the most effective treatment for multiple sclerosis, although it is still uncertain how it functions in controlling this disease. Interferon- is used to treat osteopetrosis and chronic granulomatous disease.
278
CHAPTER 8
279
The term DNA fingerprinting was coined to allude to the traditional use of fingerprints as a unique means of human identification. Whereas classic fingerprinting analyzes a phenotypic trait, DNA typing directly analyzes genotypic information. When properly conducted, DNA-based testing can provide positive evidence of a persons identity. In contrast, the phenotypes detected by blood grouping and leukocyte antigen testing are shared by sufficiently large numbers of individuals that they are not, in the strictest sense, tests of identity. Rather, they are exclusionary tests that can only prove that forensic evidence does not match a suspect or that persons are not related. All that is required for DNA fingerprinting is a small tissue sample from which DNA can be extracted. This can be blood or cheek cell samples in a paternity case, a semen sample from a rape victim, dried blood from fabric, skin fragments from under the fingernails of a victim after a struggle, or even several hairs (with the attached roots) combed from a crime scene. Ted Kaczynski, the Unabomber, was definitively linked to the case when his DNA type matched the one obtained from cells left when he licked a stamp used on a letter. Using the best available techniques, a DNA type can be obtained from cells in fingerprints on a glass or other hard surface. The time is approaching when a criminal will not be able to afford to leave even a single cell at a crime scene.
280
CHAPTER 8
tandem repeated units ranging in size from 9 to 80 bp. The number of repeats at a particular locus was variable between homologous chromosomes, hence the acronym VNTR (variable number of tandem repeats). Working at the University of Leicester, Jeffreys found two core sequences that are common to a set of VNTRs associated with the myoglobin gene locus. Assaying for them by Southern blotting produced a DNA fingerprint that was a composite of VNTRs at multiple loci, leading to the term multilocus probes. In his analysis, radioactive probes hybridize to restriction fragments that have partial homology with the core sequence, typically detecting 2030 interpretable bands. These distinctive banding patterns are inherited in a Mendelian fashion, with half of the bands derived from the mother and half from the father. The Ghana Immigration Case (1985) provided the first practical test of DNA fingerprinting. The case involved Christiana Sarbah and her teenage son Andrew,
281
who immigrated to England after living for some time with his father in Ghana. Although depositions and other information showed that Christiana and Andrew were almost certainly related, the British Home Office ordered that Andrew be deported in the absence definitive evidence to prove Christianas parentage. Jeffreys agreed to assist with the appeal case, believing it would be an ideal test of the DNA fingerprint technology he had recently developed. He used his myoglobin VNTR probes to produce DNA profiles from blood samples from Christiana, Andrew, and three siblingsDavid, Joyce, and Diana. Because of the lack of the fathers blood sample, Jeffreys reconstructed the fathers fingerprint from bands present in the three undisputed children, but absent in Christiana. About half of Andrews bands matched bands in the fathers compilation and the remaining bands were all present in Christianas fingerprint. The possibility of this happening by chance is greater than one in a trillion. The Home Office accepted the DNA fingerprint evidence and allowed Andrew to stay in England. Jeffreys probes essentially analyzed a number of VNTR polymorphisms simultaneously. The multiple bands created by the multilocus system proved difficult to analyze and standardize. The system was prone to produce artifact bands whenever a restriction enzyme failed to cut entirely, and it could be difficult to determine whether a sample had digested to completion. Furthermore, the number and frequency of alleles were never rigorously worked out, making it impossible to accurately determine the relative rarity of one fingerprint over another. Jeffreys multilocus probes were supplanted by single-locus probes which identify a polymorphism that occurs at a single location on one chromosome. The majority of RFLPs that had been discovered through the mid 1980s were point mutations that destroy or create a restriction enzyme recognition site. This type of RFLP has only two alleles and three genotypes (++, +, and ). Thus, gene mappers and forensic biologists alike sought out more variable polymorphisms as they became increasingly available in the late 1980s. Beginning in 1987, Yusuke Nakamura, Ray White, and others at the University of Utah began a systematic search for single-locus VNTRs, ultimately providing more than 100 useful polymorphic loci scattered throughout the genome. Probes for these VNTRs became widely used in gene mapping and DNA fingerprinting. Each probe hybridizes to a unique hypervariable region of the genome and generates a pattern consisting of one or two bands from an individuals DNA, depending on whether they are homozygous or heterozygous at that locus. Used alone, a single-locus probe only detects one or two differences; however, cocktails of several probes came into use for forensic purposes. Because each probe identifies a discrete locus, the frequency of each allele can be determined in population studies and the Hardy-Weinberg equation used to calculate the occurrence probability of each genotype. The VNTR loci that became most useful in forensics were those with 10 or more alleles and with a high degree of heterozygosity in many human populations, thus maximizing the ability to discriminate between two individuals. Consider the case of D1S80, a VNTR on chromosome 1 in which a 16-nucleotide unit is repeated from 14 to more than 41 times, creating 29 different alleles. By studying the occurrence of alleles in human populations, one can calculate the probability of an individuals DNA fingerprint having this or that combi-
282
CHAPTER 8
nation of DNA bands, or the probability of two DNA samples matching each other. Adding a second VNTR increases the ability to distinguish between individuals. Different VNTRs used in identity testing have been chosen on different chromosomes. This way, one can be assured that each VNTR is unlinked from the othersthat each VNTR is inherited independently. If they are unlinked, then the probability of any two bands being inherited together is the product of their individual occurrences. By the mid 1990s, most forensic laboratories were producing types with five to eight unlinked markers. Although it was initially challenged in the courts, due to its extreme sensitivity and potential for contamination, PCR eventually supplanted Southern blotting in forensic analysis. PCR made possible extremely rapid protocols that required very small amounts of template and obviated the use of radioactivity. D1S80 was among the first to be adapted for use in a forensic PCR kit. Briefly, a small sample of blood or other cells is lysed by boiling, the cell debris is removed by centrifugation, and the PCR reagents are added directly to the crude extract.
283
2 Type
3 Allele
18 31 24 37 18 18 28 31 18 25 17 24
4 Frequency
0.263 0.058 0.318 0.003 0.263 0.263 0.050 0.058 0.263 0.055 0.013 0.318
5 HardyWeinberg
2pq 2pq p2 2pq 2pq 2pq
6 Calculation
2 (0.263 x 0.058) 2 (0.318 x 0.003) (0.263 x 0.263) 2 (0.050 x 0.058) 2 (0.263 x 0.055) 2 (0.013 x 0.318)
7 D1S80 Probability
0.0305 0.0002 0.0692 0.0058 0.0289 0.0008
8 Locus 2 Probability
0.0050 0.0035 0.0075 0.0025 0.0045 0.0065
9 Combined Probability
0.000153 0.0000007 0.000519 0.0000145 0.000013 0.0000052
C 1 2 3 4 5
Following the appropriate number of synthesis cycles, the amplified DNA is separated by electrophoresis in a polyacrylamide gel, stained with ethidium bromide or silver, and visualized directly.
284
CHAPTER 8
100
120
140
160
180
200
220
240
260
280
300
320
340
360
10 Blue VICTIM
14 15 10 Black VICTIM
18 20
24
10 Gray VICTIM 200 150 100 50 10 11 9 Blue SUSPECT 200 150 100 50 14 15 9 Black SUSPECT 200 100 X Y 13 14 30 14 15 15 18 21 22 8 11 8 11
9 Gray SUSPECT 300 200 100 13 3 Blue BLOOD STAIN FROM CRIME SCENE 3000 2000 1000 14 15 15 18 21 22 11 10
3 Gray BLOOD STAIN FROM CRIME SCENE 4000 3000 2000 1000 13 11 10
Alleles Detected
Sample Victim Suspect Blood Stain From Crime Scene Sample Victim Suspect Blood Stain From Crime Scene Amelogenin XY XY XY D3S1358 14, 15 14, 15 14, 15 vWA 18, 20 15, 18 15, 18 FGA 24 21, 22 21, 22 D8S1179 13, 16 13, 14 13, 14 D21S11 28, 30.2 30 30 D18S51 14, 15 14, 15 14, 15
D5S818 10,11 13 13
D13S317 8, 11 11 11
D7S820 8, 11 10 10
D16S539 9, 11 9, 12 9, 12
THO1 7, 9 6, 9 6, 9
TPOX 9, 11 8, 11 8, 11
CSF1PO 10, 12 9, 12 9, 12
285
STR alleles are perfectly suited to fluorescent detection on a DNA sequencer, allowing forensic scientists to make use of the four dye labels (red, green, blue, and yellow) as a separate channel. Since each STR polymorphism typically produces a tight range of common alleles, three to four STRs with differing ranges in allele size can be labeled with the same dye and detected in a single channel. With this step, multiplex and megaplex polymorphism analyses gained a mechanization and reproducibility akin to hospital metabolite testing. In 1997, the Federal Bureau of Investigation (FBI) recommended that a 13marker panel of STRs, plus an XY marker, become the standard in criminal investigations. With this number of independently inherited polymorphisms, the probability of even the most common combination is in the tens of billions. Thus, modern DNA testing has the capability of uniquely identifying each and every person alive today. As of June 2002, the FBIs Combined DNA Index System (CODIS) contained 1,013,746 DNA profiles, including 977,895 profiles of convicted offenders.
286
CHAPTER 8
GBA Gaucher disease HPC1 Prostate cancer GLC1A Glaucoma PS2 (AD4) Alzheimer's disease Chromosome 3 PAX3 Waardenburg syndrome Chromosome 4
Chromosome 2 Chromosome 1
GCK Diabetes
Pendrin Pendred syndrome CFTR Cystic fibrosis Asthma DTD Diastrophic dysplasia EPM2A Epilepsy OB Obesity Chromosome 8 MYC Burkitt lymphoma
Chromosome 12
287
CX26 Autosomal recessive neurosensory deafness BRCA2 Breast cancer RB1 Retinoblastoma ATP78 Wilson disease
SNRPN Prader-Willi syndrome UBE3A Angelman syndrome FBN1 Marfan syndrome HEXA Tay-Sachs disease
Crohn's disease
p53 Tumor suppressor protein CMT1A Charcot-Marie-Tooth syndrome BRCA1 Breast cancer Jak3 Severe combined immunodeficiency APOE Atherosclerosis DM Myotonic dystrophy Chromosome 19 ADA1 Severe combined immunodeficiency Chromosome 20 Chromosome 18 Chromosome 17
PIG A Paroxysomal nocternal hemoglobinuria SOD1 Amylotrophic sclerosis APS1 Autoimmune polyglandular syndrome Chromosome 21 ATP7A Menkes syndrome DMD Duchenne muscular dystrophy
Y Chromosome
DGS DiGeorge syndrome BCR Chronic myeloid leukemia SGLT1 Glucose Galactose Malabsorption NF2 Neurofibromatosis Chromosome 22 X Chromosome
IL2RG X-linked severe combined immunodeficiency (SCID) TNFSF5 Immunodeficiency with hyper-IgM FMR1 Fragile X syndrome MeCP2 Rett syndrome ALD Adrenoleukodystrophy HEMA Hemophilia A
288
CHAPTER 8
1
Mutated gene
DNA markers
Normal gene
2
Normal gene
Mutated gene
3
Normal gene
Linked markers
Spindle apparatus
1 Homologous chromosomes, each composed of two sister chromatids, pair (synapse) during prophase of the first meiotic division. 2 Recombination occurs when chromatids cross over, exchanging DNA fragments. Linked markers remain with the original chromatid, whereas unlinked markers become separated from it. 3 During anaphase, the recombined chromatids separate into two different daughter cells (in a subsequent meiotic division, the sister chromatids will segregate into separate haploid sex cells).
Accurate DNA diagnosis becomes feasible once a marker has been located within 5 cM of the disease gene. At this distance, the recombination frequency (and probability that the marker and gene become unlinked) is 5%. Conversely, the coinheritance of the marker and the disease gene, as well as the accuracy of diagnosis, is 95%. The addition of a flanking marker within 5 cM on the other side of the gene theoretically increases the accuracy of predictions to 99.75%. (The chance of both markers becoming unlinked is 0.05 x 0.05 = 0.0025.) DNA diagnosis relies on linking one allele of a polymorphic marker to the inheritance of a disease phenotype. Because the alleles present at the polymorphic locus may differ from family to family, it is often necessary to follow a linked polymorphism through the pedigree of the family under study. This establishes which particular polymorphic allele is associated with the disease
289
state in that particular family. It is also necessary to identify heterozygous carriers of the disease gene in whom one polymorphic allele segregates with the disease gene and a different polymorphism segregates with the normal gene. In general, the closer the marker to the disease locus, the more accurate the diagnosis. A marker and a gene in extremely close proximity may be in linkage disequilibrium, in which case, they are always coinherited. Such markers, including those actually located within the disease gene, can provide accurate diagnosis without a family history. (The sickle cell polymorphism discussed earlier falls into this category.) However, markers located within a very large disease locus may not even be in linkage disequilibrium with the gene itself. For example, markers located at the 5 end of the dystrophin gene have a recombination frequency of 5%an apparent distance of 5 cM, or about 6 million nucleotides! Although many DNA diagnoses originally relied on Southern blotting, most new diagnostic tests rely almost exclusively on PCR. Southern analysis typically takes 24 hours or more to complete, whereas a PCR analysis can be completed in several hours.
290
CHAPTER 8
uously cultured. This is accomplished by fusing white cells in the blood with immortal cancer cells or by transforming them with an immortalizing oncogene from a tumor virus. In either case, the cultured white blood cells provide an easy source of DNA needed for the next, and most laborious, step. James Gusella radioactively labeled at random a number of cloned DNA fragments from a human genomic library. These were then used to probe Southern blots of DNA from HD patients and unaffected family members. Gusella was looking for a probe that identifies a polymorphism whose appearance parallels the pattern of inheritance of the disease. The assumption was that all members of the extended Lake Maracaibo family inherited the disease from a single common ancestor. Thus, if a polymorphic marker is tightly linked to the disease gene, it should be coinherited by all of the affected individuals. Luck was with Gusella: A linked marker was identified with the 12th probe tested. Using this probe, he demonstrated that the HD gene is located near the telomere of the short arm of chromosome 4. Gusellas initial good fortune did not continue, and it took almost 4 years to identify a tightly linked marker, this one also located on the centromere side of the disease locus. Theoretically, he should have been able to find other linked markers located some distance from these original markers. The intervening distance would be occupied by the disease gene, thus providing flanking markers on either side. This ultimately proved impossible, because the HD gene lies very near the telomere. HD patients turned out to have different profiles of genetic markers, indicating independent origins of the disease. Using a complex analysis of haplotype linkage disequilibrium between markers in different pedigrees, the Huntingtons Disease Collaborative Research Group finally cloned the HD gene in 1993, 10 years after its initial mapping. Affected members in all 75 HD families used in the study showed a length polymorphism in the coding region of the huntingtin
Protein
huntingtin FMR1 Ataxin-1 Ataxin-2 Ataxin-3 Ataxin-P/Q Ca++ channel Ataxin-7 PPP2R2B Androgen receptor Atrophin-1 TATA-binding protein KCNN3 POLG1
291
12
22
22
22
12
12
12
22
22
22
10,000 BP 8,000 BP
Allele 1 Allele 2
gene caused by a CAG repeat. The expansion of nucleotide triplets had been previously identified in Fragile X mental retardation and has a role in several other disorders. The normal huntingtin gene has 635 CAG repeats; the mutated version in HD patients has 36180 repeats. The number of repeats correlates with age when symptoms appear: Individuals with 3641 repeats may never have symptoms, whereas those with more than 50 repeats develop symptoms before age 20. Since the repeat occurs in a coding exon, each additional repeat adds another unit of the amino acid glutamine to the expressed huntingtin protein. This alters the three-dimensional structure of the huntingtin protein, changing its interactions with other cell proteins.
292
CHAPTER 8
damage. DMD patients show many different deletions of exons of the gene, which effectively knock out production of any functional dystrophin. In the milder form of Becker muscular dystrophy, deletions in the dystrophin gene produce a semifunctional dystrophin protein. Diagnosis by Southern blot analysis, using a cDNA of the dystrophin mRNA, could detect various exon deletions. This was replaced by multiplex PCR, where multiple sets of primers are used to amplify ten of the commonly deleted exons in a single PCR experiment.
293
encompasses the disease locus. The closest linked marker is used as a probe to isolate its corresponding genomic clone. Following restriction mapping of the clone, a restriction fragment is isolated from the end of the clone closest to the disease locus. This fragment is used to reprobe the library to identify an overlapping clone. The endmost fragment of this clone is then used to reprobe the library, and another overlapping clone is isolated. Through such a succession of overlapping clones, one walks along the chromosome region spanning the disease locus, eventually reaching the flanking marker on the other side. The overlapping fragments are then assembled to produce a map of the disease locus. Researchers screened ten genomic libraries and isolated the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which spans approximately 250,000 nucleotides. The coding exons of CFTR predict a protein of 1480 amino acids. The CFTR protein is involved in the transport of sodium chloride and water in and out of the epithelial cells that line the lungs and the digestive system. As in sickle cell anemia, the primary genetic lesion in CF is a specific mutation affecting a single amino acid. Approximately 70% of CF patients show a 3bp deletion, named deltaF508, that results in loss of a single phenylalanine residue at amino acid position 508 of the CFTR polypeptide. With this mutation, the cell excretes out too much salt, and too little water, resulting in a sticky mucus that clogs the lungs and extra salt in the patients sweat.
Probe from 5 flanking marker is used to identify an overlapping fragment from a genomic library Genomic DNA fragment Probe Probes from the 3 ends of cloned fragments are used to identify successive overlapping cloned fragment
Chromosome walking continues until a clone is identified that contains the 3 flanking marker
294
CHAPTER 8
PHARMACOGENOMICS
Throughout the second half of the 20th century, major pharmaceutical companies amassed libraries containing hundreds of thousands of chemical compounds. These numbers have increased dramatically with the advent of combinatorial chemistry, which builds up compounds from simple chemical componentsanalogous to the way DNA probes are built up on a photolithographic DNA chip (Chapter 6). Each of these compounds is a potential pharmaceutical that can fight disease by altering the activity of a gene or its corresponding protein. For example, a number of compounds have been developed against histamines and other molecules involved in allergic reactions. Pharmaceutical development, however, has been hampered by a relative lack of metabolic targets against which companies can test their huge compound libraries. The availability of the human genome sequence promises to solve this problem by presenting drug developers with a trove of new targets. Using the human genome sequence to inform drug discovery is termed pharmacogenomics. Each gene that is definitively linked to a disease becomes a validated target for drug discovery. Knowledge of mutations in that gene, and the corresponding changes in the three-dimensional structures of the encoded protein, allows one to develop strategies for screening compound libraries. Rational drug design carries this concept a step further by using the target proteins own structure to predict the properties of small molecules that can bind to an active site or otherwise modulate the proteins activity. This was the triumph of Gleevec, the first anticancer drug developed using detailed knowledge of protein kinase receptors (Chapter 7). As more genes, and therefore proteins, are identified in a disease pathway, treatments increasingly can be tailored to specific defects in a metabolic or signal transduction pathway. Defects in different proteins in the same pathway may cause the same disease or symptoms. (Recall Beadle and Tatums experiment, where mutations in different genes produced the same metabolic phenotype.) Thus, the same apparent disease may present different drug targets, depending on which gene in a pathway is mutated. For example, a mutation in the cytoplasmic domain of the EGF (epidermal growth factor) receptor is blocked by Gleevec; however, a different drug would be needed to counter a mutation in the EGF extracellular domain.
295
Chromosome
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122 X
Lod
0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500
Position (cM)
bipolar disorder. To further complicate matters, each disorder has a range of severity and expression that may make it difficult to standardize diagnosis to the point that all researchers would evaluate the same pedigree in an identical manner. The goal of population studies, like family studies, is to link particular DNA polymorphisms to a disease phenotype. The LOD (logarithm of the odds) score is the key statistical method used to establish linkage in family and population studies. On the basis of an observed recombination frequency between a marker and a putative disease locus, the LOD score is a ratio of the probability (odds) of a pedigree occurring at that linkage value divided by the probability (odds) of no linkage. A LOD score of 3, which is generally considered the threshold for possible linkage in a complex disorder, means that linkage is 1000 times more likely than no linkage. As a logarithmic function, like the Richter earthquake scale, each LOD score increases by a factor of 10. Thus, a LOD score of 3 represents a 10 times closer association between a marker and a locus than does a LOD score of 2. The LOD score is extremely sensitive to changes in data analysis and laboratory errors. The change in diagnosis of a single person in an extended pedigree may be enough to lessen the association between the marker and a phenotype and, thus, weaken statistical linkage. Despite these problems, a number of genome scans and family studies during the past 10 years have reported linkage for schizophrenia with loci on chromosomes 1, 6, 8, 10, 13, 15, and 22. The case is similar
296
CHAPTER 8
for bipolar disorder, where linkage has been reported on chromosomes 4, 12, 13, 18, 21, and 22. Although the linkage reported in any single study is modest, the fact that the same regions have turned up again and again in different pedigrees is consistent with the hypothesis that multiple susceptibility genes contribute to an individuals overall risk of schizophrenia and bipolar disorder. Many researchers believe that isolated populations offer great promise in the search for genes behind complex disorders. Since they preserve only a fraction of human diversity, isolated populations present a relatively homogeneous genetic background against which it may be easier to identify genes for common and heterogeneous disorders. Many behavioral studies have focused on the Amish, among whom alcohol and drug abuse is rarely a confounding problem. The islanders of Tristan de Cuhna, in the middle of the Atlantic Ocean, are interesting for their extremely high rates of asthma. A polymorphism that is informative (coinherited with the disease locus) in one population may not be informative in a different population. Thus, as we saw in the case of the Huntingtin gene, individual linked markers may fail to establish linkage if a disease has arisen separately in different populations. The causative lesion of many disorders also varies, with some groups having unique, or private, mutations not seen in other groups. The Old Order Amish and Mennonites provide an object study in founder effect. Members of these two religious sects, known as the plain people, live in agricultural communities where they eschew most modern technology and dress in simple clothing without adornment or buttons. To this day, they disdain motor vehicles in favor of horse and buggy. Lancaster County, Pennsylvania, remains a homeland for both groups, each of which is predominately derived from fewer than 100 individuals who settled there in the 1700s. Like many groups with small founding populations, they have concentrated mutations for some otherwise rare metabolic disorders. For example, maple syrup urine disease (MSUD) affects about 1/250,000 children in the general population but strikes about 1/400 Amish and Mennonite children. Methylcrotonyl-CoA carboxylase (MCC) deficiency, a related but usually less severe disorder of the breakdown of the amino acid leucine, provides an example of a private mutation. This disease has an overall frequency of about 1/50,000 in Caucasian populations, but it reaches a frequency of about 1/1500 among the Old Order Amish and Mennonites of Lancaster County. Affected Amish children have a G-to-C missense mutation at position 295 of the -subunit of the MCC gene, whereas Mennonites have a frameshift mutation caused by a T insertion at position 518. Thus, although they live nearby and share a closely related religion, lifestyle, and ethnic background, each of these groups has inherited a different point mutation responsible for a rare disorder.
Single-nucleotide Polymorphisms
Imagine the complexity of searching for any of several potential genes involved in a heterogeneous disorder in which different genes or gene combinations may produce similar phenotypes in different populations. The various genes are likely to be associated with different markers in different population groups. Association studies using single-nucleotide polymorphisms (SNPs) hold potential in solving the problem of linking markers to the genes involved complex disorders. Although the term SNP burst on the scene in the late 1990s, they are nothing but point mutations. To put this into perspective, there is about 1 nucleotide
297
difference per 1200 nucleotides in two comparable chromosomes. This translates into about 3 million single-base differences and 100,000 amino acids differences between any two people. However, most single-nucleotide mutations are rare in a population; to be generally useful in gene scans, an SNP must have a population frequency of at least 1%. Because SNPs are the most frequent type of polymorphism, there are potentially hundreds of useful SNP markers in a region of linkage disequilibrium that is associated with a disease gene. A region of linkage disequilibrium is termed a haploblock, because it is inherited, without recombination, like the haploid mitochondrial DNA (mtDNA) or the Y chromosome. A set of SNPs, or other markers, within the haploblock are inherited together as a haplotype. Different populations have accumulated different SNPs within the haploblock. Thus, affected individuals from different populations may share certain markers within the haploblock, whereas other markers will be unique in certain populations. Just as one may find a consensus sequence for promoter regions and intron/ exon splice junctions, haplotypes can be identified that represent a consensus of SNPs that are coinherited with the disease gene across many populations. Although no individual SNP is likely to have great predictive value, the combinatorial effect of an SNP haplotype can be a powerful tool in linkage studies. Thus, in 2002, the National Human Genome Research Institute announced a 100-million-dollar project to establish a haplotype map of the human genome, potentially containing 200,000 haploblocks. Many researchers are confident that once the human genome map is heavily populated with SNPs, disease genes can be identified in heterogeneous populations of unrelated individuals. For example, a sample could be drawn from a database of all individuals who suffer from severe asthma, irrespective of their population group. An equivalent control sample is then drawn of healthy people. Each patient and control are SNP typed across his or her whole genome or across a specific candidate region. A haplotype is constructed for each person, using SNPs that occur in haploblock regions of the genome. Then, computer algorithms search for a consensus haplotype that is associated with the disease locus. Since haplotypes may encompass tens or hundreds of SNPs, this type of association analysis is much more complex for determining linkage with one marker at a time. It is not clear how saturated with SNPs the genome map must be before pure association analysis of this type will become possible, but it may be as few as 1 million SNPs.
Pharmacogenetics
Everyone at one time must have taken pause at the paradox of a physician asking us if we are allergic to a particular drug. After all, the doctor should be the one to inform us of a potential problem. Unfortunately, trial and error is the only way to determine a response to most drugsit takes an allergic reaction to know if we are allergic! SNPs offer the potential of predicting a negative response before a drug is taken. Thus, the endgame of genetic medicine is pharmacogenetics, predicting drug response and tailoring treatment to each persons genetic makeup. However, before we enter this era of personalized medicine, experts today believe we must pass through a period of population medicine, where drugs are targeted according to a generalized profile of the population group that most closely matches the patient.
298
CHAPTER 8
Although it is very much in vogue today, the term pharmacogenetics was first coined in 1959 by Freidrich Vogel. This was based on earlier evidence that drug responses are inherited and vary between population groups. Notably, African American soldiers serving in Italy during World War II suffered adverse effects, including hemolysis, from the antimalaria drug primiquine. This was correlated with glucose-6-phosphate dehydrogenase (G6PD) deficiency, which, ironically, provides some protection against malaria. Drug response is largely mediated by so-called metabolic enzymes in the liverthe cytochrome P450 monooxidases (CPY450s)which detoxify compounds and metabolize many drugs into their bioactive forms. People who are extensive metabolizers efficiently convert a given drug to its active form and/or metabolize it at a rate that provides the desired therapeutic effect. Poor metabolizers fail to convert enough of the drug to its active form or metabolize it at a rate that fails to produce a therapeutic effect. Toxic metabolizers convert the drug into a toxic product or metabolize it so slowly that it accumulates to toxic levels. In the late 1970s, Robert Smith of St. Marys Hospital, London, noticed an unusually high incidence of side effects, including an unusual fainting response, among patients prescribed the antihypertension drug debrisoquine. He found that about 8% of Caucasians (but less than 2% of Black and Asian populations) are poor metabolizers, handling debrisoquine 10200 times less efficiently than extensive metabolizers. Michel Eichelbaum, of the University of Bonn, found similar disparities in the metabolism of sparteine, an anti-arrythymic. This led to the realization that both drugs are metabolized by the CPY2D6 enzyme and that poor metabolizers inherit a defective CPY2D6 enzyme. Subsequent work revealed that CPY2D6 is involved in deficient responses to at least 40 common drugs, including codeine, dextromethorphan, beta-blockers, monoamine oxidase inhibitors, tricyclic antidepressants, antipsychotics, neuroleptics, and fluoxetine (Prozac). Cloning and sequencing of the CPY2D6 gene in 1988 showed that poor metabolizers have polymorphisms that produce splicing errors or amino acid substitutions. Recent research showed that several SNP haplotype pairs in CPY2D6 predict response to the anti-asthma drug albuterol, with striking differences in haplotype distribution between population groups. Screening for relevant CPY450 polymorphisms would be a perfect application for gene chip and would be a logical first step in the development of pharmacogenetics.
299
As these early people wandered, their DNA accumulated mutations. Some provided advantages that allowed these pioneers to adapt to new homes and ways of living. Most were nonessential. Mutations are the grist of evolution, producing gene and protein variations that have allowed humans to adapt to a variety of environmentsand to become the most far-ranging mammal on the planet. The same mutational processes that generated human diversitypoint mutations, insertions/deletions, transpositions, and chromosome rearrangementsalso generated disease. It may be hard to see from our current vantage point, but the entire industrial revolution has occupied only about 0.1% of our 150,000-year history as a species. The cradles of western civilizationclassical Greece and Rometake us back into only 2% of our history. The earliest city-states of Mesopotamia, Babylonia, Assyria, and China take us back only 4% of way into our past. At 6%, we reach the watershed of agriculture, which changed forever the way humans live and work. After language, the domestication of plants and animals is the single greatest civilizing factor in human history. Increased production and performance of domesticated organisms made possible urbanization and task specialization in human society. Thus, the labor of fewer and fewer farmers produced enough food and clothing materials to satisfy the needs of growing numbers of nonfarmersartisans, engineers, scribes, and merchantsfreeing them to develop other elements of culture. Reaching back the remaining 93% of our history, to the dawn of the human species, we lived only as hunter-gatherers. The fastest evolving part of our genome, the mitochondrial control region, accumulates about one new mutation every 20,000 years. Mutations are five- to tenfold less frequent in most regions of the nuclear chromosomes. Thus, virtually every gene in our genome is, on average, only one event away from our hunter-gatherer heritage. This leads to two far-reaching conclusions that substantially broaden our understanding of evolutionary processes and the origin of human disease: Throughout most of human history, the hunter-gatherer group was a basic population unit upon which evolution acted. Our basic anatomy, physiology, and many aspects of behavior are essentially identical to the hunter-gatherers who ranged through the ancient landscapes of Africa, Europe, Asia, Australia, and the Americas. It may be difficult for many people today to conceive of what is meant, in a genetic sense, by a human population. This is because, over the past quarter century, people have become extremely mobile. Airplanes and four-wheel vehicles have made it possible to travel virtually anywhere in the inhabited world within a day or two. Major urban centers have become cosmopolitan, with mixes of people representing many races and cultures. Even so, today there are still regions of the world where people are born, reproduce, and die all in the same village. This essentially defines the classical definition of a human population: a group of people who, by reason of geography, language, or culture, preferentially mate with another. Unique human populationsfor example, the Saami of Finland, the Ainu of Japan, the Nanuit of Alaska, the Yanomami of Brazil, the Pygmies of Central Africa, and the Bushmen of Southern Africahave preserved unique cultures and languages. Their genomes preserve the genetic residue of a time when all
300
CHAPTER 8
human beings lived in smaller and more cohesive groups. Small populations are subject to the founder effect, inbreeding, and genetic drift (a random fluctuation of nonessential alleles). Over millennia, these effects join with selection to concentrate particular gene variations within different population groups. Gene variations come into equilibrium when a population grows to several thousand individuals.
301
80
80
40
40
0
UV intensity Skin color
40
40
80
No data
80
40
40
40
40
Recent modeling of worldwide UVB radiation, based on satellite mapping of the earths ozone layer, shows a correlation between levels of UVB that are sufficient for vitamin synthesis and skin pigmentation. Thus, dark-pigmented skin is found in the tropics where there is sufficient UVB to synthesize vitamin D year-round. Lighter skin, but with the ability to tan, is found in the subtropical and temperate regions, which have at least 1 month of insufficient UVB radiation. Very light skin that burns easily is found north of 45 degrees, where there is insufficient UVB year-round. In 2000, Nina Jablonski and George Chaplin, of the California Academy of Sciences, offered a more complete explanation for the evolution of dark-pigmented skin among early hominids in Africa. They proposed that melanin protects the bodys stores of the B vitamin folate, which is essential for reproduction and embryonic development. This conclusion came from the synthesis of sever-
302
CHAPTER 8
al lines of research: (1) Exposure to sunlight rapidly reduces folate levels in the blood. (2) Treating male rodents with folate inhibitors impairs sperm development and induces infertility. (3) Folate deficiency during pregnancy, including reduction apparently induced by overuse of tanning beds, increases risk of neural cord defects in infants. Thus, as early hominids spent more time hunting and gathering on the open savanna, those with darker skin would have had greater reproductive success and produced more healthy offspring.
Africa
303
in the Rift Valley of Africa. Early members of our own genus, Homo erectus, arose in the same region about 2.5 million years ago. These archaic hominids migrated out of Africa approximately 1.8 million years ago to found populations in Europe, the Middle East, and southern Asia. The earliest fossils of modern humans, or artifacts made by them, have been found in southern and eastern Africa, dating to about 140,000 years ago. Remains of modern humans dating to 100,000 years ago have been found in the Middle East, to 60,000 years in Asia and Australia, and to 45,000 years in Europe. By modern humans, we mean members of our own species Homo sapiens, who share with us important anatomical features (skull shape and size) and behavioral attributes (use of blades, bone tools, pigments, burial goods, representational art, long-distance trade, and varied environmental resources). These humans subsequently spread to Micronesia, Polynesia, and the New World (North and South America). How modern humans emerged is a matter of debate between proponents of two opposing theories. Supporters of the multiregional theory contend that modern human populations developed independently from archaic hominid (Homo erectus) populations in Africa, Europe, and Asia. Early modern groups evolved in parallel with one another and exchanged members to give rise to modern population groups. Supporters of the displacement theory, commonly known as out of Africa, contend that all modern human populations are derived from one or several modern population groups that left Africa beginning about 70,000 years ago. These founding groups migrated throughout the Old World, displacing any surviving archaic hominids. Thus, scientists all agree that our earliest hominid relatives arose in Africa, but disagree on when the direct ancestors of living humans left Africa to populate the globe.
304
CHAPTER 8
Because of its high mutation rate, the mitochondrial control region evolves more quickly than other chromosome regionsit has a faster molecular clock. The fast mutation rate means that lineages diversify rapidly, amplifying differences between populations. However, rapid mutation also introduces the confounding problem of back mutation where the same nucleotide mutates more than once, returning it to its original state. Multiple mutations at the same position also cause an underestimation of the total number of mutation events. Thus, the number of observed differences between human and chimp sequences are less than one would expect to have occurred in the 67 million years since the lineages diverged. The chance of back or multiple mutations is much smaller over the period during which modern humans have arisen. So, the number of observed mutations among living humans is very close to the actual number that has accumulated since we arose as a species. Mitochondrial DNA (mtDNA) offers another important advantage in reconstructing human evolution: With very few documented exceptions, the mitochondrial chromosome is inherited exclusively from the mother. This is because mitochondria are inherited from the cytoplasm of the mothers large egg cell. Any paternal mitochondria that may enter the ovum at the moment of conception are identified by different ubiquitin proteins expressed on their surface and destroyed. The lack of paternal chromosomes with which to recombine greatly simplifies the analysis of mitochondrial inheritance. The mitochondrial genome is inherited intact over thousands of generations, without the confounding effect of crossover with a paternal chromosome. Because the mitochondrial genome is haploid, having only a contribution from the mother, mtDNA types are termed haplotypes (half-types).
Allan Wilson
(Courtesy of Cold Spring Harbor Laboratory Archives.)
305
A. Six allelic sequences Sequence 1) Sequence 2) Sequence 3) Sequence 4) Sequence 5) Sequence 6) a g c t g g c t g a a t g c t a t c t g c g t c g c g c g a a a t a a c g t c a g c a a t t c g t t a c a tc t c t c t a g g g c agctggctga a tgctatctgc c tcgcgcgaaa t aacgtcagca a ttcgttacat t tctctagggc a g c t g g c t g a a t g c t a t c t g c g t c g c g c g a a ac aac g t c a g c a a t t c g t t a c a t t t c t c t a g g g c agctggctga a tgctatctgc g tcgcgcgaaa t aacgtcagca a ttcgttacat t tctctagggc a g c t g g c t g ag t g c t a t c t g c c t c g c g c g a a a t a a c g t c a g c a a t t c g t t a c a t t t c t c t a g g g c a g c t g g c t g a a t g c t a t c t g c g t c g c g c g a a a t a a c g t c a g c ag t t c g t t a c a t c t c t c t a g g g c
B. The six haplotypes for the five SNPs in the allelic sequences above. Position Sequence 1) Sequence 2) Sequence 3) Sequence 4) Sequence 5) Sequence 6) 11 22 33 44 55
----------a ----------g ----------t ----------a ----------c -------------------a ----------c ----------t ----------a ----------t -------------------a ----------g ----------c ----------a ----------t -------------------a ----------g ----------t ----------a ----------t -------------------g ----------c ----------t ----------a ----------t -------------------a ----------g ----------t ----------g ----------c ----------
C. Two possible trees for the evolutionary relationships among the haplotypes above, assuming that mutations occur sequentially.
AGTAC (1) AGTAT (4)
AGTAT (4) AGTAC (1) AGTAT (4) AGTAT (4) ACTAT (2) AGTAT (4) AGTAC (1) ACTAT (2)
AGCAT (3)
AGTAC (1)
AGTGC (6)
ACTAT (2)
AGTAT (4)
AGTAT (4)
AGTGC (6)
AGTAC (1)
GCTAT (5)
ACTAT (2)
AGCAT (3)
Using Haplotypes to Create Trees Showing Evolutionary Relationships tionary history. The accumulated DNA evidence has confirmed Wilsons original story. Several lines of evidence point to the emergence of modern humans in Africa about 150,000 years ago. The greatest amount of DNA variation occurs in Africa, suggesting that African populations have been accumulating mutations for the longest period of time. Europeans have only about half the variation of African groups, suggesting that they are only about half as old. Most Asian and European variations are a subset of variations found in African populations, suggesting that Asian and European populations are derived from an African source. The deepest roots of a tree diagram of human variation contain only Africans. Ancient alleles have not been found in non-African populations. Interestingly, comparisons of mitochondrial and Y chromosome polymorphisms suggest that men and women have had different roles in the peopling of
306
CHAPTER 8 A
Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan Greece Japan GGTACCACCCAAGTATTGACTCACC GGTACCACCCAAGTATTGACTCACC CATCAACAACCGCTATGTATTTCGT CATCAACAACCGCTATGTATTTCGT ACATTACTGCCAGCCACCATGAATA ACATTACTGCCAGCCACCATGAATA TTGTACGGTACCATAAATACTTGAC TTGTACGGTACCATAAATACTTGAC CACCTGTAGTACATAAAAACCCAAT CACCTGTAGTACATAAAAACCCAAT CCACATCAAAACCCCCTCCCCATGC CCACATCAAAACCCCCCCCCCGCGC TTACAAGCAAGTACAGCAATCAACC TTACAAGCAAGTACAGCAATCAACC CTCAACTATCACACATCAACTGCAA TTCAGCTATCACACATCAACTGCAA CTCCAAAGCCACCCCTCACCCACTA CTCCAAAGCCACCCCTCACCCACTA GGATACCAACAAACCTACCCACCCT GGATATCAACAAACCTACCCACCCT TAACAGTACATAGTACATAAAGCCA TAACAGTACATAGTACATAAAGCCA 25 50 75 100 125 150 175 200 225 250 275
C
Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece Chimpanzee Greece TTCTTTCATGGGGAAGCAAATTTAA TTCTTTCATGGGGAAGCAGATTTGG GTACCACCTAAGTACTGGCTCATTC GTACCACCCAAGTATTGACTCACCC ATTA CAACCGCTATGTATTTCGTA ATCAACAACCGCTATGTATTTCGTA CATTACTGCCAGCCACCATGAATAT CATTACTGCCAGCCACCATGAATAT TGTACAGTACCATAATCACCCAACC TGTACGGTACCATAAATACTTGACC ACCTATAGCACATAAAATCCACCTC ACCTGTAGTACATAAAAACCCAATC ACATTAAAACCTTCACCCCATGCT CACATCAAAACCCCCTCCCCATGCT TACAAGCACGCACAACAATCAACCC TACAAGCAAGTACAGCAATCAACCC CCAACTATCGAACATAAAACACAAC TCAACTATCACACATCAACTGCAAC TCCAACGACACTTCTCCCCCACCCT TCCAAAGCCACCCCTCACCCACTAG AATACCAACAAACCTACCCTCCCTT GATACCAACAAACCTACCCACCCTT 25 50 75 100 125 150 175 200 225 250 275
B
Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal Greece Neandertal CCAAGTATTGACTCACCCATCAACA CCAAGTATTGACTCACCCATCAGCA ACCGCTATGTATTTCGTACATTACT ACCGCTATGTATCTCGTACATTACT GCCAGCCACCATGAATATTGTACGG GTTAGTTACCATGAATATTGTACAG TACCATAAATACTTGACCACCTGTA TACCATAATTACTTGACTACCTGCA GTACATAAAAACCCAATCCACATCA GTACATAAAAACCTAATCCACATCA AAACCCCCTCCCCATGCTTACAAGC AACCCCCCCCCCCATGCTTACAAGC AAGTACAGCAATCAACCCTCAACTA AAGCACAGCAATCAACCTTCAACTG TCACACATCAACTGCAACTCCAAAG TCATACATCAACTACAACTCCAAAG CCACCCCT CACCCACTAGGATACC ACGCCCTTACACCCACTAGGATATC AACAAACCTACCCACCCTTAACAGT AACAAACCTACCCACCCTTGACAGT ACATAGTACATAAAGCCATTTACCG ACATAGCACATAAAGTCATTTACCG 25 50 75 100 125 150 175 200 225 250 275
D
Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal Neandertal CCAAGTATTGACTCACCCATCAGCA CCAAGTATTGACTCACCCATCAGCA ACCGCTATGTATCTCGTACATTACT ACCGCTATGTATTTCGTACATTACT GTTAGTTACCATGAATATTGTACAG GCCAGCCACCATGAATATTGTACAG TACCATAATTACTTGACTACCTGCA TACCATAATTACTTGACTACCTGCA GTACATAAAAACCTAATCCACATCA GTACATAAAAACCTAATCCACATCA AACCCCCCCCCCCATGCTTACAAGC ACCCCCCCCCCCCATGCTTACAAGC AAGCACAGCAATCAACCTTCAACTG AAGCACAGCAATCAACCTTCAACTG TCATACATCAACTACAACTCCAAAG TCATACATCAACTACAACTCCAAAG ACGCCCTTACACCCACTAGGATATC ACGCCCTTACACCCACTAGGATATC AACAAACCTACCCACCCTTGACAGT AACAAACCTACCCACCCTTGACAGT ACATAGCACATAAAGTCATTTACCG ACATAGCACATAAAGTCATTTACCG 25 50 75 100 125 150 175 200 225 250 275
307
the planet and in the mixing of genes among population groups. Generally, mtDNA types show a gradation (or cline) of allele frequencies from one geographic region to another. This is the signature of gene flow, the slow and steady exchange of genes between adjacent populations, which occurs in many cultures when women leave their families to live in their husbands villages. Y polymorphisms, on the other hand, show discontinuities between adjacent regions, suggesting that men have not moved freely between local groups. However, related Y chromosome types do leave the signature of migrations and war campaigns that abruptly transplant genes over long distances. Thus, members of the Black South African tribe, the Lemba, have the telltale signature of Cohanin Jewry displaced from the Middle East. The most common Y chromosomes in cosmopolitan southern Japan clearly were transplanted from Korea in the last several hundred years, but the Ainu of the northern islands of Japan have an ancient affinity with Tibetans.
308
CHAPTER 8
If one takes the mutation rate of the mitochondrial control region to be about 1/20,000 years, then 7 mutations equals 140,000 years to a convergence to a common ancestor, in accordance with the earliest fossil record of modern humans. The 27-mutation difference between living humans and Neandertal suggests that our lineage converges on (or diverges from) a common ancestor about 550,000 years ago. This weighed still more DNA data in favor of the Out of Africa model.
Bushy Evolution
Until about 30 years ago, the single-species hypothesis dictated that only a single human ancestor lived on the earth at any point in prehistory. Each new Australopithecine or hominid-like fossil was immediately considered one of a direct succession of ancestors of modern humans. This created the image of an evolutionary tree with a long, straight trunk and essentially no branches until one reached racially and ethnically diverse modern humans. Thus, when racial segregation was still a way of life in the United States and elsewhere around the world, straight-line evolution provided scientifically minded people the comfort of believing in a shared evolutionary past, but put some distance between modern human groups who considered themselves different from each other. This followed from a concept of gradual evolution, in which features that define a new stage are essentially properties of the preceding stage. In other words, there was a certain preordained sense in the way evolution proceeded. However, as more and more distinctive fossils emerged, it became impossible to fit them all into one straight path toward modern humans. Some had to be evolutionary dead ends and not a part of our direct ancestry. The concept of a bushy human pedigree, with short branches due to frequent extinctions, was
Present day
Paranthropus
H. neanderthalersis
Ardipithecus
A. anamensis A. afarensis A. africanus P. robustus K.(?) rudolfensis ? H. heidelbergensis H. sapiens
Ar. ramidus
Australopithecus
K. platyops ?
Kenyanthropus
Homo
Bushy Evolution
Current theories of hominid evolution show a bushy lineage, rather than a straight line. This evolutionary scheme is based on the work of Donald Johanson. (Redrawn, with permission, from the Institute of Human Origins, Arizona State University.)
309
predicted by the model of punctuated evolution articulated in the early 1970s by Niles Eldridge of the American Museum of Natural History and Stephen Jay Gould of Harvard University. According to this model, evolution proceeds in abrupt fits and starts, as adaptive changes arise in small, dispersed populations. Svante Pbos demonstration that the mtDNA variation of Neandertal lies outside the variation of modern humans provided DNA evidence to support the bushy tree concept. It makes better sense of the fossil evidence, but also makes better sense of the fact that most genetic variation accumulated during a time in human history when hunter-gatherer populations averaged perhaps 50 persons. A bushy evolutionary tree is exactly what one would expect from selection acting upon private sets of mutations that arise in small, relatively segregated groups of hunter-gatherers. It is most useful to consider that mutation and selection are separate events, often widely separated by time. Thus, neutral mutations anticipate future circumstancesenvironmental or behavioral when they may provide a selective advantage. Working from the premise that most mutations did not confer an immediate selective advantage to the hunter-gatherer, then most new mutations would survive or be extinguished in the group according to the whims of genetic drift. In this way, different hunter-gatherer groups accumulated different sets of mutations. On occasions, one or more of the accumulated mutations would provide a selective advantage to one population or anotherand groups exchanged alleles through intermarriage. However, as climate, circumstances, or the luck of genetic drift changed, different lineages would become extinct, leaving behind a fossil record of its distinctive features.
100% D
75% A 50% B C
25%
200
310
CHAPTER 8
311
30 20
% of total pairwise comparisons
10 0 20 10 0 20 10 0 20 10 0 0 5
Nigerian chimpanzees
Western chimpanzees
% Sequence divergence
years. Pollen records suggest that much of Southeast Asia was deforested following the eruption, and significant changes are also recorded in the pollen profile of Grand Pile, in France. Greenland ice cores show that the eruption of Toba was followed by 1000 years of the lowest oxygen isotope ratios of the last glacial period, indicating the lowest temperatures of the last 100,000 years. Thus, it is not difficult to believe that the eruption of Toba produced several years of volcanic winter, followed by 1000 years of unrelenting cold. This surely would have decimated human populations outside of the scattered refuges in tropical Africa.
312
CHAPTER 8
60 Miles
18 Miles
The fossil record shows that the modern humans who first reached the Middle East were replaced after the Toba eruption by cold-tolerant Neandertals, illustrating that adaptations are relative to environmental factors. Interestingly, Neandertal seem to have gone through similar population bottlenecks during its 250,000 or so years on earth. Mitochondrial control region sequences have been obtained from two additional Neandertal specimens from Croatia and the Caucasus. Added to the German sample, these represent about half the range of Neandertal. Comparisons of these samples suggest that Neandertal had only about the same (limited) level of diversity as modern human populations, despite the fact that this species existed nearly twice as long as currently has Homo sapiens.
313
nated in the Fertile Crescent and spread westward with agriculture, at a rate of about 1 km per year. Farming arose about 10,000 years ago and roughly marks the boundary between the Paleolithic (Old Stone Age) and Neolithic (New Stone Age). Because the east-west component accounted for the greatest variance across many genes, they concluded that Neolithic genes had essentially replaced Paleolithic genes. In their quest for new agricultural lands, farmers moved inexorably northwest through Europe, mixing with and eventually displacing the hunter-gatherer populations they encountered. The advent of mtDNA typing of ancient remains and living Europeans challenged Cavalli-Sforza, Menozzi, and Piazzas demic diffusion model of the wave-like expansion of Neolithic farmers. The first data came from tzi the Iceman, a 5000-year-old mummy found frozen in the Tyrolean Alps in 1991. An international team obtained mitochondrial control region sequences from tissue samples in 1994. tzis mtDNA type, and others differing from his by only a single nucleotide, proved common among people alive today of European ancestry. tzis DNA looks modern, because he is fully modern. Five thousand years is less than a single mutation from the present, even by the fast mitochondrial clock. Bryan Sykes of Oxford University extended the human mitochondrial lineage directly back to the Paleolithic, when he analyzed DNA from a human tooth from the Cheddar Gorge in the south of England. Excavated from a limestone cave, the tooth was carbon dated to 12,000 years ago, approximately 6,000 years before farming reached England. Even so, like tzi the Iceman, Cheddar Mans identical DNA type and many others differing from his by only a single nucleotide are common among Europeans alive today. Clearly, Cheddar Man was a hunter-gatherer, yet his DNA has survived into the present time. So there could not have been a virtually complete replacement of Paleolithic huntergatherers by Neolithic farmers. Analysis of mtDNA from living Europeans by Sykes and Antonio Torroni (of the University of Rome) identified seven major mitochondrial haplogroups.
314
Age
45,000 25,000 20,000 17,000 17,000 15,000 10,000
Origin
Greece Southern Russia Southern France Northern Spain Central Italy Northeastern Italy Middle East
% of Modern Europeans
11 6 46 5 9 6 17
The fictitious names given by Bryan Sykes to the founders of the seven major European mitochondrial haplogroups are based on the alphabetic classification system of Antonio Torroni.
Sykes popularized the founders of these lineages as the seven daughters of Eve. Six of these lineages, representing about 80% of Europeans alive today, are derived from Paleolithic stocks dating back before the advent of agriculture. Only about 20% of European haplotypes are young enough to represent the new genes of Neolithic farmers. This is not far different from the 28% of variance described by Cavalli-Sforzas first principal component. mtDNA has thus shown an unbroken continuity of inheritance from the Paleolithic hunter-gatherers clear through to suit-clad urban humans alive today. Although agriculture did eventually reach England and the rest of Europe, it came largely on its own. The culture of farming diffused, rather than the farmers themselves. This will almost certainly hold true for the diffusion of agriculture from other ancient centers in Africa, Asia, and the Americas. Modern Neolithic farmers did not outpace the Paleolithic hunters, as they had replaced the Neandertals before them. The Paleolithic hunter-gatherers lacked nothing genetically, physiologically, or behaviorallythat they needed to move into the modern age. These hunter-gatherers became farmersand they became us.