Abstract Genome resources for apple (Malusdomestica study of the evolution of plant genome structure, as well as
Borkh), the main fruit crop of temperate regions, have been facilitating genomic-assisted breeding. Transcriptomics, pro-
developed over the past 10 years, culminating in the se- teomics and metabolomics studies are greatly benefiting
quencing of the Golden Delicious genome. The apple from the availability of an annotated genome. In this
genome sequence anchored to a high-density linkage map review, we report on the status of the apple genome and on
provides the apple community with new tools to identify current molecular and genetic tools available in apple that will
genes and other functional elements that will enable the improve the efficiency of the process of cultivar development;
we discuss how an integrative omics approach could greatly
mine agronomically and economically favorable phenotypes;
through the forest of tree genomes we review the databases and bioinformatics tools that are
available to manage and exploit the large amounts of biolog-
Delicious genome (Velasco et al. 2010). Identification and accepted a SNP rate lower than 1 in 500 bp (in the genome the
annotation of genes in apple was first based on collections of average frequency of SNPs is 4.41 Kb). Frequently, this
expressed sequence tags (ESTs) from libraries covering a resulted in an independent assembly of individual haplotypes.
variety of genotypes, tissues and experimental conditions In some cases, large insertions or deletions, affecting only one
(; Newcomb et al. 2006; Gasic et al. chromosome in a pair, resulted in genomic regions being
2009). EST resources have allowed the efficient development assembled as partially overlapping contigs.
of DNA-based markers (Chagn et al. 2008; Celton et al. The total length of all contigs, 603.9 Mb, covers about
2009a), gene discovery (Park et al. 2006) and comparative 81.3 % of the apple genome. In the genome, repetitive
genomics (Gasic et al. 2009). The first physical map of apple, elements correspond to 500.7 Mb (67 %). The unassembled
and indeed of any tree species, was constructed from bacterial part of the genome is basically all repetitive (138.4 Mb), and
artificial chromosome clones (Han et al. 2007) covering the estimated genome size is 742.3 Mb. Metacontigs were
approximately 927 Mb. constructed as ordered and oriented groups of contigs linked
The availability of the apple whole-genome sequence is with paired reads matching to non-repetitive parts of the
causing a rapid acceleration of apple genetics and genomic contigs. Merging of individual contigs into metacontigs
research by providing new tools to identify genes and other was controlled by accepting a maximum total average cov-
functional elements, to enable study of the evolution of plant erage of20. In total, 103,076 contigs were assembled into
genome structure, as well as the more efficient development 1,629 metacontigs.
of improved apple varieties. Diploid Golden Delicious was Sequencing of this heterozygous cultivar generated a
chosen for genome sequencing because it is a widely grown resource of 3.3 million SNPs. Polymorphisms identified by
cultivar and has been extensively used in apple breeding assembling the two haplotypes of the Golden Delicious
programs worldwide (Noiton and Alspach 1996). Moreover, genome were used to develop markers for each metacontig,
sequencing of a heterozygous cultivar allowed the identifi- to correlate them with linkage groups (LGs). Anchoring of
cation of a large amount of variation, and the development metacontigs (71.2 % of 598.3 Mbp) was based on a high-
of single nucleotide polymorphism (SNP) markers. Of spe- density map derived from six F1 populations totaling 720
cial interest to biologists and breeders are polymorphisms in individuals (
and around the coding regions, as these represent a substan- viewer). The map comprises a total of 1,730 markers, in-
tial resource for molecular breeding programmes, through cluding 196 simple sequence repeat (SSR) markers, 1,500
marker/trait association by quantitative trait loci (QTL) genomic-derived SNPs and 34 EST-derived SNPs. All ge-
mapping. The reference genome sequence is enabling anal- nomic SNPs were developed from electronic SNPs of
ysis of data from the current high throughput re-sequencing Golden Delicious genomic sequence identified in non-
of apple varieties and wild species by international teams. repetitive contigs. SNP markers were developed both from
Sequencing and assembly of the apple genome followed metacontigs that were not yet assigned a chromosomal
the whole genome shotgun (WGS) approach using a com- location on the assembly, and also from anchored metacon-
bination of Sanger dye primer sequencing of paired reads tigs containing just one mapped marker and therefore not
and 454 sequencing by synthesis (SBS) of paired and un- orientated. The total length of the map was 1,354.9 centi-
paired reads. The two techniques provided a genome cover- morgan (cM), with an average interval of 0.78 cM between
age of4.0 and13, respectively. Assembly of the genome adjacent loci. In total, 17 LGs were assembled using TMAP
was complicated by the high number of chromosomes (17), software (Cartwright et al. 2007) at a minimum LOD of 10
the content of highly repetitive DNA, the degree of hetero- and were identified based on SSR markers mapped to exist-
zygosity of Golden Delicious and the repetitiveness of ing apple linkage maps. Sequenced markers that were well
DNA sequences resulting from old and recent genome- positioned on the genetic map were used to order and to
wide duplications. Hence, existing software and strategies orient metacontigs along the appropriate LGs. Cases where
were not adequate for the assembly of this genome and a more than one marker was present on a metacontig enabled
modified version of the assembly pipeline developed for metacontig orientation, and on average 75 % of the an-
grape genome sequencing (Velasco et al. 2007; Zharkikh chored genome was correctly orientated, corresponding to
et al. 2008) was used to deal with the high degree of at least 87.5 % of correct orientation of anchored contigs.
heterozygosity in the Golden Delicious genome. The SNP-based markers were essential for improving the meta-
gene-centric approach to the genome assembly was adopted, contig assembly. Many adjacent metacontigs that were not
using available apple ESTs in GenBank as starting points for initially merged because of non-significant links between
assembly of corresponding gene sequences. The assembly them, were associated based on neighbouring genetic
started with almost unique sequences and progressed by add- markers, and were hence successfully merged into a single
ing sequences with a higher degree of repetitiveness. To avoid larger metacontig. On the other hand, if a metacontig was
misassembly, a conservative approach was adopted, that only associated with several markers from different LGs, or with
distant markers from the same LG, it was considered chimeric noted. Analysis of the distribution of selected genomic
and was split into separate metacontigs by a semi-automated features with recombination frequency did not show evi-
procedure (Velasco et al. 2007). This was especially important dence for higher recombination rates in gene-dense regions,
in the case of homeologous chromosomes, where the unique as is often observed in chromosomes of other plants (Anderson
sequences are the most similar between corresponding chro- and Stack 2002). In fact, gene and retrotransposon densities are
mosomes. The linkage map was, in general, in good agree- negatively correlated, as has also been shown by Paterson et al.
ment with the sequence resulting from the genomic assembly. (2009) for sorghum.
A slightly different order of the two maps was observed
for only 2.7 % of the markers. In the majority of such Evidence of genome duplication and the origin of the Pyreae
cases, these represent adjacent markers and do not affect
the overall alignment of the maps. Sequencing of the apple genome has revealed that large
A1.5 dataset of 454 sequence from a dihaploid (DH) lengths of apple chromosomes are copied in other chromo-
genotype, developed after a spontaneous duplication of an somes. This duplication would explain why the apple, and
haploid individual selected from the progeny of a self de- closely related species belonging to the tribe Pyreae (i.e.
rivative from Golden Delicious (Lespinasse et al. 1999), pear, rowan and quince), have 17 chromosomes (x017,
was used to provide additional information on allele assign- haploid chromosome number), while all other plants in the
ment. As the DH data represent one of the two Golden Rosaceae family (including peach, raspberry and strawberry)
Delicious haplotypes, with the exception of recombination have between 7 and 9 chromosomes. The apple fruit or
events during the reproduction and generation of the DH pome, which defines the swelling of the flower receptacle,
strain (rare events at single gene level), the double haploid is only found in the tribe Pyreae in the Rosaceae lineage and
genotype was significant for accurate haplotype phase de- its appearance is first detected c. 50 million years ago (Mya)
termination. Comparing WGS with DH reads allows the (Aldasoro et al. 2005; Wolfe and Wehr 1988). This indicates
splitting of all WGS reads into two haploid sets. Available that the pome probably evolved after a Pyreae-specific
estimates show that about 80 % of 3.3 million identified genome-wide duplication. Many of the genes in the duplicated
SNPs are covered by DH reads. areas of the apple genome are related to fruit development and
The apple genome sequence, anchored to a high-density this larger number in comparison with other sequenced fruit
linkage map, supplies to the apple community an extraordi- may have enabled the distinctive features seen in the pome.
narily useful tool for genomic-assisted breeding. As many as Since the apple genome revealed a nearly perfect duplication
90.2 % of the genes have been located on the chromosomes; of extant chromosomes, it has been proposed that the current
these include genes related to disease resistance, fruit qual- genome evolved from a progenitor species with nine chromo-
ity, plant development, and reaction to environment and are somes, which underwent a duplication followed by the loss of
a valuable resource for determining the control of a range of one chromosome. The most ancient fossil that is attrib-
tree and fruit traits desired by breeders. In addition, com- utable to species similar to the modern Maloideae has
parison of the genetic and physical distances separating been found in North America, where two species of the
neighbouring markers in anchored contigs makes it possible perennial herb Gillenia with nine chromosomes and
to compare the recombination frequency at different genome which could be close to the ancient progenitor is still
positions. Variations in the correspondence between physi- found today. This hypothesis is supported by molecular
cal and genetic distance along chromosomes are well known phylogenetic analyses based on various nuclear-encoded
and have already been well documented in several species genes (Evans and Campbell 2002), and by the compari-
(Tanksley et al. 1992; Paterson et al. 2009; Schmutz et al. son between whole genome sequence data and that of
2010). Such an understanding of recombination pattern adds various Rosaceae species (Velasco et al. 2010), confirm-
important information to planning of map-based cloning ing that a Gillenia-like taxon (x09) is the closest extant
projects. A high-resolution genetic map is much easier to diploid genus to both homeologous genomes of apple.
generate in presence of high recombination values, as in The fruit of Gillenia has an appearance similar to that of
regions of suppressed recombination a larger progeny size an apple core with surrounding flesh removed.
is needed to recover the number of crossovers necessary to Remnants of older large-scale gene or genome-wide du-
construct a detailed genetic map (Tanksley et al. 1992). plication were also unveiled and short blocks of genes
Trends of variation specific to particular regions of the revealing old polyploidy events were found on all chromo-
chromosome arms have been observed. The telomeres, somes. Re-mapping those to the ancestral state demonstrates
which are reached by both the sequence and the map for a triplicate structure among parts of apple chromosome that
14 of 34 telomeric ends of chromosomes, showed no obvi- were found to be colinear with grape chromosomes that
ous effect for some LGs, but there was a significant effect have been demonstrated as homeologous because of an
for other LGs, where higher recombination rates have been ancient hexaploidy (Jaillon et al. 2007). Hence, the apple
genome provides strong evidence for an early hexaploid factor in the accelerated rate of evolution that often follows
state, which is shared by most if not all eudicots (Tang et gene duplication in plants (Lynch and Conery 2000). It is
al. 2008; Van de Peer et al. 2009). noteworthy that functions of the groups of genes that are
expanded in the apple genome compared with other se-
The apple genome differs from other sequenced plant quenced genomes often relate directly to response to stress
genomes and adaptation to environment, as well as functions associ-
ated with wood formation and pome development.
The total number of genes predicted for the apple genome (in The number of identified transcription factors, was
total 57,386, including some genes that may be present only in among the highest found in the sequenced plant genomes
one of the two chromosomes of a pair), is the highest reported and the families C2H2, NAC and CCAAT were particu-
among plants so far, compared with Arabidopsis (27,228; larly highly represented in the apple genome compared, poplar (45,654; http://genome. with other plant genomes. Ethylene production is well
jgi-, papaya known to be associated with response to both abiotic
(28,027;, and biotic stress (Liu and Zhang 2004), along with NAC
Brachypodium (25,532;, grape proteins, which also control flower and root development
(33,514, rice (40,577; (Olsen et al. 2005), as well as plant cell wall composition, sor- during the process of wood development (Demura and
ghum (34,496;, cucumber Fukuda 2007; Zhong et al. 2008). Tellingly, transcription
(26,682; factors binding to CCAAT are associated with regulation
index.jsp), soybean (46,430; of flowering and reproduction, a function that is critical to
soybean/ and maize (32,540; http:// evolution and indeed to continued survival. Putative Nucleotide binding site (NBS) genes encoding resistance
apple-specific genes numbered 11,444. Of the predicted apple (R) proteins in Golden Delicious totalled 992, compared
genes, 93 % were annotated and classified by an automatic with 392, 402, 178, 341, 535, 238, 245 and 129, respective-
annotation with subsequent manual revision. The remaining 7 ly in soybean, poplar, Arabidopsis, grape, rice, Brachypo-
% of gene models showed no homology to pre-existing dium, sorghum and maize. The fraction of NBS-LRR
sequences resident in comparative data sources and were (leucine-rich repeat) genes is significantly higher in Euro-
considered apple-specific. The existence of more than one sids II (apple, poplar and grape) compared with Eurosids I
haplotype in the variety Golden Delicious may have con- (Arabidopsis). In monocotyledons, this class of genes dom-
tributed to increased gene number, as was also noted for grape inates. In addition to NBS genes, the apple genome contains
(Velasco et al. 2007). 320 LRR-kinase (LK) genes, compared with only 216 in
The question can be posed as to why apple has retained so Arabidopsis. The genomic distribution of NBS and LRR
many genes following the duplication event, and what special kinase (LK) resistance genes is not uniform, with apple
adaptive purpose they might serve. The within-lineage ge- chromosomes 2, 8, 11 and 15 containing almost twice as
nome wide duplication of apple has been dated >50 Mya many resistance genes as the others.
(Velasco et al. 2010), coinciding with similar events in Significant features of the pome as a fruit are its
other plants, as well as mass extinctions of some species nutritious flesh with a flavour and aroma that is attractive
around the CretaceousTertiary boundary. Polyploidy could to potential dispersers of its many seeds. There are large
have provided a mechanism for an escape from extinction, differences in the degree of duplication of specific classes
as has been suggested by Crow and Wagner (2006). As of genes associated with conferring these distinctive char-
functional compensation by duplicate genes for a more acters and this was particularly evident for the enzymatic
severe phenotypic event tends to be preserved by natural steps leading to anthocyanins and flavonoids, isoflavones/
selection for a longer time than a less severe effect (Hanada isoflavonones and terpenes that contribute to both colour
et al. 2009), it might be expected that a large number of and flavour. For the majority of apple gene families that
genes could be retained after such a massive event. There contribute to phenylpropanoid production, the number of
are three potential evolutionary fates for duplicated genes: genes was almost always the largest among sequenced
silencing of one copy by degenerative mutations (non- genomes. There was also expansion in numbers of genes
functionalization), acquisition of a novel, beneficial func- related to metabolism of sorbitol, the principal sugar in
tion by one copy (neo-functionalization), or both copies apple fruit, as well as the sorbitol transporter PCSOT2,
experiencing loss of sub-functions, leading to establishment which is specific to apple fruit. In addition, the subclade
of complementary functions (sub-functionalization; Lynch StMADS11 of MADS-box genes in the AP1 clade includes
and Conery 2000). Retention of functional genes coding two genes expressed in the pome, as well as an additional
for proteins with a modified or new function would be a 15 other members.
Other omics cDNA-AFLP analysis of abscising and non-abscising

Golden Delicious fruitlet populations detected 131, 66
Transcriptome and 30 differentially expressed bands from cortex, pe-
duncle and seed, respectively (Dal Cin et al. 2009).
The transcriptome is the repertoire of coding and non- Annotation indicated that although 25 % of differentially
coding RNA transcripts produced by a cell, organ or tissue, expressed genes were of unknown function, 64 % were
at any one time and as such is extremely dynamic, varying involved in transport, protein fate, metabolism, transcrip-
with stages of plant growth and development or in response tion, signal transduction, energy or binding, including
to environmental stimuli. Genome-wide profiling, or tran- genes involved in sugar metabolism and mobilisation,
scriptomics, can increase our understanding of the genes and ethylene biosynthesis and auxin transport that have been
pathways involved in biological processes, as genes with implicated in abscission in other plant species (Ruiz et
similar temporal and/or spatial expression patterns are often al. 2001; Taylor and Whitelaw 2001).
functionally related and under the same genetic control. Using a genetically modified Gala line transformed
Malus transcriptomics studies profiling gene expression at with the HcrVf2 scab resistance gene, and sampling at 0
varying stages of bud, flower and fruit development, or as a and 48 h after infection with Venturia inaequalis, the causal
result of grafting onto different rootstocks, or in response to agent of apple scab, 34 differentially expressed genes were
challenge by pathogens and environmental stresses, have identified by cDNA-AFLP (Paris et al. 2009a). The sixteen
employed a variety of techniques. annotated genes included resistance-related genes, hydro-
lases, a GDSL lipase and a thioredoxin-related protein.
cDNA-amplified fragment length polymorphism Similarly, assessment of expression in the HcrVf2 trans-
genic Gala line over a 48-h period post V. inaequalis
The cDNA-amplified fragment length polymorphism infection, using cDNA-AFLP analysis by denaturing high-
(AFLP) technique allows comparative, quantitative analy- performance liquid chromatography and automated DNA
sis of gene expression, with sensitivity and specificity fragment collection, showed 287 differential expression
permitting detection of low-abundance transcripts and sub- fragments, some of which were identified as genes impli-
tle differences in transcriptional activity (Breyne et al. cated in pathogen recognition, and transcription factors as
2003). The technique was useful before the availability well as genes encoding signalling and effector proteins
of whole genome sequence; however, its utility for high- (Paris et al. 2009b).
throughput transcriptomics was limited by the labor- Differentially expressed genes were also identified by
intensive nature of the subsequent characterisation of differ- cDNA-AFLP analysis after inoculation of resistant Geneva
entially expressed transcripts. 41 (G.41) and susceptible Malling 26 (M.26) apple
To investigate why apple scions grafted onto M.7 root- genotypes with Erwinia amylovora, the causal agent of fire
stocks are less susceptible to fire blight disease and less blight disease (Baldo et al. 2010). Approximately 200 genes
dwarfed than those grafted onto M.9, Jensen et al. (2003) were found to be differentially expressed between the two
used cDNA-AFLP to examine differential gene expression genotypes at two post-infection times (2 and 48 h). Both
in shoot tip RNA from Gala scions grafted onto the two genotypes showed gene induction and repression in re-
rootstocks. Thirty-six and 56 unique sequences showed sponse to the bacterial infection. Based on similarity to
higher amplification in Gala/M.7 and Gala/M.9, previously characterized genes, over half of the differential-
respectively. Although differentially expressed sequences rep- ly expressed cDNA-AFLP fragments could be placed into
resented genes involved in a range of functions, a notable functional categories, and included genes involved in pho-
rootstock effect was up-regulation of photosynthesis-related tosynthesis, general metabolism, plant stress response, sig-
and sorbitol-6-dehydrogenase genes when on M.9 rootstock, nalling pathways, energy, protein metabolism and transport.
whereas scions on M.7 showed up-regulation of a putative Bioinformatics analysis comparing the cDNA-AFLP
negative regulator of photosynthesis genes. This M.9- sequences with sequences previously identified during
associated increase in photosynthetic activity might account MalusE. amylovora interaction, interactions between Ara-
for larger fruit on scions grafted onto M.9. Gala/M.9 bidopsis thaliana and virulent or avirulent Pseudomonas
scions also exhibited up-regulation of genes implicated in syringae, and the salicylic acid response in A. thaliana,
cell-cycle control, perhaps partly accounting for the differ- indicated that 90 of the cDNA-AFLP sequences were spe-
ences in the degree of dwarfing of scions grafted on to the cifically involved in the Malus-E. amylovora interactions.
two rootstocks. Genes related to pathogen infection and abi- Subsequent quantitative RT-PCR analysis over a 60-h time
otic stress were up-regulated predominantly in scions on M.7 course confirmed the cDNA-AFLP differential expression
rootstocks, consistent with increased tolerance to disease, patterns in the two apple genotypes for 22 of 28 candidate
drought and cold. resistance/susceptibility genes tested. This qRT-PCR
analysis identified genes that were: only activated in the EST sequencing effort, with an emphasis on flowering and
resistant G.41 and were good fire blight resistance gene fruiting tissues, and from tissues responding to pathogen
candidates; genes that were activated at different times in infection, yielded an additional 182,241 EST sequences,
G.41 than in the susceptible M.26 and often repressed in which clustered into 23,442 TC sequences and 9,843 single-
M.26 between 12 and 48 h, and could be involved in the tons (
plants response that contributes to symptom development As of October 2010, GenBank dbEST has 336,358 Malus
and resistance; genes that were only repressed in M.26 and entries derived from over 120 different libraries. These
could be useful as susceptibility markers. resources have facilitated an alternative approach to
genome-wide expression analysis, where the frequency of
Suppression subtractive hybridization occurrence of ESTs representing a particular gene could be
considered proportional to the expression level of that gene,
Suppression subtractive hybridization (SSH), although not a providing the cDNA library was constructed in an unbiased
quantitative approach to expression analysis, can detect low manner. Park et al. (2006) analysed statistically 200,000
abundance transcripts and has been used in Malus to exam- publically available Malusdomestica Borkh. ESTs (clus-
ine differential expression between samples. SSH identified tered into 23,000 TC sequences and 21,000 singletons) to
262 and 218 differentially expressed clones preferentially identify genes specifically or preferentially expressed in
expressed in the leaves of V. inaequalis-resistant Remo fruit. Many of the genes highly expressed in fruit were also
and susceptible Elstar, respectively (Degenhardt et al. highly expressed in non-fruit tissues, and often represented
2005). Homology-based annotation identified putative func- basic metabolic enzymes. However, significant numbers
tions for about half the clones, revealing up-regulation of of genes were found to be more highly represented by
plant defence-related genes and genes involved in the de- ESTs in fruit-derived libraries than in non-fruit-derived
toxification of reactive oxygen species in resistant Remo. libraries, including genes involved in the biosynthesis of
A third of the clones preferentially expressed in Remo lipid-derived volatile esters and fatty acid-derived C6
before pathogen challenge were metallothioneins, whereas short-chain volatiles, of importance in conferring aroma
in Elstar metallothionein transcript levels were only sig- and flavour. Other genes with higher EST representation
nificant after challenge with V. inaequalis. The constitutive in fruit-derived libraries corresponded to pathogen response
expression of defence-related genes in scab-resistant and ethylene-signalling genes, whereas genes related to pho-
Remo, even in the absence of infection, might account, tosynthesis, the generation of precursor metabolites and ener-
at least in part, for its resistance to V. inaequalis. In a study gy were underrepresented by ESTs in fruit-derived libraries.
evaluating changes in the leaf transcriptome in response to Of 714 genes preferentially expressed in fruit tissues, approx-
infection with the bacterium that causes fire blight, Norelli imately 80 % appeared temporally or spatially regulated dur-
et al. (2009) identified 468 differentially expressed clones ing fruit development. Analysis of fruit cortex libraries, at late
between E. amylovora- and mock-infected Gale Gala over cell expansion phase through to ripe fruit, identified genes that
a 96-h period. A significant and rapid transcriptional response were up-regulated during the early stages of ripening, includ-
was seen within 2 h of inoculation, with up-regulation of ing those involved in cytokinin modification, ethylene bio-
genes associated with oxidative and osmotic stress and sorbi- synthesis, free-radical scavenging, fruit astringency and
tol transport, although the greater response to infection was an flavour, as well as stress response. Genes with roles in cell
initial down-regulation of many genes, e.g. aquaporin and wall modification (xyloglucan endotransglycosylase) and
dehydration-responsive proteins. Later transcriptional branched-chain amino acid biosynthesis (acetolactate syn-
responses to infection were also evident in additional defence thase) were down-regulated as fruit ripened. This study also
and stress-related genes. showed higher expression of acyl CoA synthetase, lipoxyge-
nase, enoyl-CoA hydratases, malony-CoA:ACP transacylase
Frequency of Occurrence of EST and alcohol dehydrogenase in ripening fruit skin relative to
cortex, consistent with the production of volatile esters from
Single-pass sequencing of 151,687 clones from cDNA precursors derived from the catabolism of fatty acids via the
libraries, derived from mRNA extracted from 34 different LOX and beta-oxidation pathways (Knee and Hatfield 1981;
tissues and treatments, provided extensive coverage of the Rowan et al. 1999).
Malus transcriptome (Newcomb et al. 2006). These EST The frequency of EST occurrence approach was also used
sequences represented various commercial apple cultivars to examine gene expression in response to abiotic stress
and rootstocks, and a variety of tissues, developmental (Wisniewski et al. 2009). Some 22,600 EST sequences were
time points and abiotic and biotic stress treatments and obtained from leaf, xylem, bark and root tissue cDNA
clustered at 95 % identity in to 17,460 tentative con- libraries from non-stressed Royal Gala plants, as well as
sensus (TC) sequences and 25,478 singletons. A second from plants exposed to low temperature or water deficit.
After clustering EST sequences from each library and represented on the array were differentially expressed in
performing homology-based searches, approximately 70 % buds, between summer and autumn. As buds entered a
of the clones could be assigned to functional categories. dormant state, the changes in gene expression were not
Analysis indicated up-regulation of defence/stress-related limited to down-regulation, as a significant number of genes
and energy-related genes in plants subjected to water or tem- were upregulated during this transition period.
perature stress and down-regulation of general metabolism-, The same microarray platform was also used to analyze
photosynthesis- and transport-related genes. Low temperature gene expression during fruit ripening (Schaffer et al. 2007)
also resulted in up-regulation of protein metabolism genes and and fruit development (Janssen et al. 2008). Schaffer et al.
cell growth and development genes, whereas water deficiency (2007) analysed expression of 179 genes, involved in the
resulted in down-regulation of nucleic acid metabolism and production of the ester, phenylpropanoid and terpene vola-
signalling genes and up-regulation of heat shock proteins and tile compounds that contribute to fruit aroma. The fruit skin
dehydrins. Although both stress treatments showed a general from an antisense ACC oxidase transgenic Royal Gala
down-regulation of photosynthesis-related genes, genes line, which produces no detectable ethylene and has no
associated with the light reactions of photosynthesis, as obvious aroma, was used to analyze expression of these
well rubisco small subunit and rubisco activase, were up- genes in response to exogenous ethylene over 8 days post-
regulated in response to the stress treatments. In addition, treatment. Seventeen genes were identified as likely ethyl-
gene expression in each of the various tissues responded ene control points of the biochemical pathways contributing
differently to the two treatments. to aroma and in general encoded enzymes responsible for
There is no doubt that these Malus EST sequences have the initial committed step and final step of the straight-chain
proved to be an invaluable resource for apple transcriptom- ester, branched-chain ester, sesquiterpene and phenylpropa-
ics; however, their use for direct gene expression profiling noid pathways. An additional 941 genes that responded to
has limitations. Rare transcripts from a particular tissue or ethylene, in either fruit skin or cortex, were identified, of
treatment are seldom represented in the existing EST data- which 728 were up-regulated, and these included genes
sets. Capturing EST sequences of rare transcripts requires involved in cell wall structure.
relatively deep sequencing of clones from a library, or Microarray analysis of Royal Gala fruit development
normalization to balance out the relative abundance of all revealed almost 2,000 genes that showed significant
transcripts. A number of Malus libraries were normalized changes in expression over a 146-day post-anthesis time
during construction, but these are of no use in determining course and demonstrated four major patterns of coordinated
relative levels of gene expression via an EST frequency of gene expression (Janssen et al. 2008). Functional classifica-
occurrence approach. The advent of next generation sequenc- tion of genes indicated that in early- and mid-developmental
ing (NGS) is allowing rapid expansion of Malus EST data, at a phases, as the fruit structure changed rapidly, genes associ-
fraction of the cost associated with previous EST sequencing ated with cellular organisation increased in expression.
efforts. Over 900,000 additional EST sequences have recently Three cell-cycle genes showed relatively high expression
been acquired, using Roche/454 pyrosequencing of normal- during early development and down-regulation at 35 days
ized libraries from 17 Royal Gala tissues, with the primary after anthesis (DAA), suggesting these genes play a role in
aim of aiding apple genome annotation (Gleave and Luo, the cell division processes that occur in the 30 days after
unpublished). pollination.
Throughout the mid-developmental stages, as fruit be-
Microarrays came less active metabolically, genes involved in cellular
transport were up-regulated, whereas while fruit ripened,
The availability of significant amounts of EST data facili- genes of the energy and metabolism categories were up-
tated the development of Malus microarray platforms, pro- regulated, particularly genes involved in lipid, fatty acid and
viding the means to assess mRNA levels of thousands of isoprenoid metabolism. As expected, a number of the genes
genes in biological samples simultaneously. The first Malus related to starch metabolism were also temporally regulated
microarray report used a 15,276 feature oligonucleotide during various stages of fruit development. In a separate
array to examine variability of gene expression in spur buds analysis of gene expression during fruit development with a
of clonal, field-grown Sciros/Pacific Rose scions on microarray of 3,484 cDNAs from young and mature Fuji
MM.106 rootstocks (Pichler et al. 2007). Microarray anal- fruit, 138 genes were found to be significantly more highly
ysis using RNA sampled from trees on four days during expressed in young fruit (21 DAA) than in mature fruit
summer, with buds in the first stages of floral development, (175 DAA), leaf or floral tissue (Lee et al. 2007), with
and 2 days during autumn, with buds in the transition to over half of these showing at least an eight-fold reduction
dormancy, showed the greatest variance in gene expression during fruit development. Eighty-eight genes could be classi-
between the two seasons. Up to 15 % of the genes fied as being related to photosynthesis, protein synthesis, cell
proliferation and differentiation, cell enlargement, metabolism with protein extraction and subsequent protein separation and
or stress response. Following on from their cDNA-AFLP analysis. However, recent reports on modified protein extrac-
study, Jensen et al. (2010) used an oligonucleotide microarray, tion procedures have aided apple proteomics studies (Guarino
representing 55,230 apple transcripts, to assess gene expres- et al. 2007; Wang et al. 2008).
sion of Gala scions grafted onto different rootstocks. An example of small-scale Malus proteomics is an eval-
Although the majority of scion transcripts showed little uation of Holstein Cox leaf apoplast proteins before and
variation in expression across all Gala/rootstock combina- after application of a non-pathogenic bacterium (Kurkcuoglu
tions, differential expression of 2,934 transcripts was ob- et al. 2004). Intercellular extraction of proteins, gel-based
served as result of grafting onto particular rootstocks, with separation and isolation, and protein identification by de novo
each scion/rootstock combination exhibiting a unique pat- sequencing of trypsin-digestion products, using quadrupole/
tern of differentially expressed transcripts. Functional clas- time of flight hybrid mass spectrometry, showed up-regulation
sification of the differentially expressed genes revealed a of a number of pathogenesis-related proteins (e.g. -1,3-glu-
high proportion were stress response-related. This micro- canase) and down-regulation of a non-specific lipid transfer
array approach revealed that at the transcript accumulation protein as a consequence of the bacterial application. A more
level, few scion genes are affected by the rootstock, con- substantial study focussed on the repertoire of proteins from
sistent with the earlier findings using the cDNA-AFLP the flesh of ripening Annuraca fruit, and 2D-gel electropho-
approach to examine rootstock effects on gene expression resis revealed 303, 425 and 470 distinct proteomic spots for
in scions (Jensen et al. 2003). three Annuraca accessions, 203 of which were common
In general, the expression patterns detected by microarray among the accessions (Guarino et al. 2007). Mass spectrom-
analysis were confirmed by qRT-PCR, validating the use of etry was used to identify 39 of the proteins and they were
these microarray platforms as a relatively efficient and ac- classified into functional categories including metabolism and
curate method for examining global changes in gene expres- energy, stress and ripening, and allergens. The majority of
sion. However, microarrays do have their limitations, in that these proteins could be associated with particular biochemical
they have a limited dynamic range, lack the sensitivity pathways or responses that would be expected in ripening
required to detect subtle changes in expression and are a fruit. The abundance of fructose-1,6-biphosphate aldolase,
closed platform, only detecting changes in expression of glyceraldehyde-3-phospate dehydrogenase, triose-phosphate
genes that are represented on the array. The availability of an isomerase and enolase reflects the role of these enzymes
annotated apple whole genome sequence will enable the in providing substrates for respiration, and organic acid
establishment of new Malus microarray platforms with a and pigment synthesis. Consistent with the increase in
more complete representation of genes. However, next gen- respiration at the ripening stage, a number of additional
eration sequencing (NGS) offers new approaches to analyz- proteins related to energy production (ATP-synthase D
ing global changes in gene expression (Eveland et al. 2008; chain, adenine-phosphoribosyltransferase I, nucleoside-
Wang et al. 2010) and is likely to become the method of diphosphate kinase I, cytochrome C oxidase 6B subunit
choice for much of the high-throughput transcriptome anal- and a mitochondrial processing peptidase) were present
ysis. NGS has already been applied to Malus transcriptom- at this stage. NAD-dependent malate dehydrogenase
ics, to examine the transcriptional changes in susceptible abundance probably relates to malic acid being a pre-
Golden Delicious infected with V. inaequalis (Celton et al. dominant organic acid of ripening fruit and the two
2009b) and to compare gene expression in HoneyCrisp glutamine synthases identified are likely to play a role
and Golden Delicious during fruit development (Schaffer in NH3 mobilization during ripening. Other ripening-
et al. 2010). related proteins identified included [Mn] superoxide dis-
mutase and a type 2 peroxiredoxin, both with a role in
Proteomics controlling the oxidative state, and other stress-related
proteins, as well as a number of allergens, proteins
Since there is often little correlation between the abundance involved in programmed cell death, and proteins whose
of an mRNA transcript and the amount of its corresponding homologs in other species respond to ethylene or auxin.
protein, because of stability, posttranslational modification Cao et al. (2008) applied a 2D-gel electrophoresis, mass
and degradation processes, determining the proteins spectrometry proteomics approach to flower bud induc-
expressed by a cell, tissue or organ, at a defined time, can tion in Fuji, finding 283 significant changes in proteins
provide an even greater understanding of a biological pro- in flower buds compared with leaf buds. Although many
cess than transcriptomics. Proteomic profiling of apple pro- proteins were common to the whole flower bud devel-
teins has often proven difficult, because of apple tissues opmental process, some specific changes were observed
relatively low protein content and high concentrations of with the loss of 19 floral bud specific proteins and the
pigments, carbohydrates, polyphenols and starch that interfere appearance of eight new floral bud-specific proteins as
floral bud development progressed from the sixth to juice between different apple cultivars, both in the total
ninth week after the spur ceased growing. Following on amount of sugar and the proportions of individual sugars
from their transcriptomics study of apple plants response (Fuleki et al. 1994). Glucose, fructose, raffinose and xylose
to water deficiency, Wisniewski et al. (2009) carried out showed the greatest variation and cold storage significantly
a 2D-difference in gel electrophoresis (DiGE) proteomic influenced the concentrations of most sugars, with a general
analysis of crabapples (Malus pumila) response to water decrease in sucrose and increases in fructose, glucose and
deficiency in relation to abscisic acid (ABA) treatment, xylose. Other comparative studies of specific metabolite
which mediates a stress response, or to betaaminobutyric amounts between cultivars have included HPLC analysis
acid (BABA) treatment, which induces drought tolerance. of vitamin C content (Davey and Keulemans 2004) and
Eight days after an initial watering, the ABA, BABA and polyphenolic compounds (Escarpa and Gonzalez 1998;
water-only treatments showed 2, 8 and 30 % water loss Guyot et al. 2003; Tsao et al. 2003). Limited variation in
from leaves, respectively. DiGE revealed that by 10 days, vitamin C concentrations was observed among 31 apple
138 proteins were differentially regulated by more than cultivars at harvest, although a substantial variation was
1.5-fold in at least one of the treatments. A number of seen among cultivars following a period of cold storage.
proteins showed up-regulation by ABA and BABA, with Studies of fruit polyphenolic metabolites have also
ABA-induction being immediate, whereas BABA-induction shown significant variation among cultivars. Procyani-
of the same proteins occurred only after the onset of water dins appeared to be the major phenolic compound in
stress, supporting the concept that BABA induces abiotic all apple cultivars, although caffeoylquinic acid and ()
stress resistance at least partly by potentiating an ABA- epicatechin were also present at relatively high concen-
regulated pathway. trations. Fruit skin generally possessed higher amounts of
One limitation in proteomic studies in Malus has been the phenolic compounds than fruit flesh, with quercetin gly-
difficulty in protein identification, since a high proportion of cosides almost exclusively present in skin, and cyanidin
the de novo sequenced peptides fail to match sequences 3-galactoside found only in red skin, whereas dihydrox-
available in public databases. In some of the studies ycinnamic acid esters, phloretin glycosides and flavan-3-
described above, only 50 % of the peptide sequences ols were found in both flesh and skin. Chlorogenic acid
could be matched to available DNA sequence data. The was the major polyphenolic in the flesh of a number of
availability of an annotated apple whole genome sequence cultivars, one exception being Granny Smith (Escarpa
will facilitate the identification of the majority of de novo- and Gonzalez 1998). Exposing fruit skin to the sun
sequenced peptides/proteins, and coupled with improvements increased the amounts of cyanidin-3-galactoside and
in mass spectrometry instrumentation should greatly aid future quercetin 3-glycoside, compared with shaded skin,
proteomics studies. whereas phloridzin, catechins and chlorogenic acid con-
centrations did not appear to be influenced by exposure
Metabolomics to the sun (Awad et al. 2000). Metabolite analysis of
fruit volatiles of progeny from a Royal GalaGranny
Metabolites are the end products of the cellular processes Smith cross, parents with distinctive flavours and vola-
resulting from the transcriptional and proteomic activity, tile profiles, was determined using headspace gas chro-
with the metabolome being the entire complement of these matographmass spectrometry (GCMS; Rowan et al.
metabolites in a cell, tissue or organ. Initial metabolomics 2009). The volatile profiles were separated into two
studies in Malus focused on defined metabolites such as groups on the basis of the amounts of acetates, ethyl
primary sugars, acids and amino acids in fruit (Ackermann butanoate and alcohols. In sensory panel analysis, fruit
et al. 1992). High-performance liquid chromatography containing butyl, 2-methylbutyl, pentyl and hexyl acetate
(HPLC) analysis of sugar contents during Glockenapfel esters were more similar in flavour to the Royal Gala
fruit development and storage revealed fructose, glucose parent, whereas fruit containing higher amounts of ethyl
and sorbitol concentrations to be relatively constant until butanoate, butanol, 2-methylbutanol and hexanol were
around harvest time, when there was a sudden increase in more similar to the Granny Smith parent. The Mendelian
the amounts of fructose and glucose and a slight postharvest segregation of 2-methylbutyl acetate suggested control by a
increase in sorbitol. In contrast, sucrose showed a steady single major gene or a strong effect of a quantitative trait
increase until harvest. Malic and citric acid gradually de- locus, which was subsequently genetically mapped to Royal
creased during fruit development, with a slight increase just Gala LG 2, followed by mapping of a candidate gene,
before harvest, whereas amino acids dramatically decreased MpAAT1 from the apple whole genome sequence, to the locus
during the first 10 weeks of fruit development, followed by (Wiedow et al. 2010).
maintenance of fairly constant amounts. HPLC analysis has Improvements in analytical instrumentation and data han-
shown considerable variation in the sugar composition of dling have enabled metabolomic studies to expand. Rudell
et al. (2008) evaluated changes in primary and secondary (GALDI) and employed the technique for metabolite profil-
metabolites in response to postharvest light and storage. ing of the fatty acids and flavonoids in fruit peel, and the
Using gas chromatographymass spectrometry (GCMS) acids and sugars in fruit core, as well as for imaging metab-
and HPLC-ultraviolet/visible-mass spectrometry, 264 meta- olite molecules in apple fruit slices. This metabolite imaging
bolic components were distinguished in Granny Smith peel, demonstrated relatively even distribution over the apple
of which 78 were identified. Amounts in peel of a number of flesh of sucrose and malic and quinic acids, accumulation
phenylpropenoids, including cyanidin 3-galactoside and cya- of the long-chain fatty acid, linoleic acid, at the core line and
nidin 3-glucoside, responsible for red colour, were elevated in sepal, petal, dorsal- and ventral-carpellary bundles. In gen-
response to light. Phenylalanine, a precursor of the eral, the flavonoids, phloretin, epicatechin, quercetin and
phenylpropanoid pathway, showed elevated amounts in phloridzin, also accumulated in the bundles, but not in the
response to light, although prolonged irradiation resulted core line or flesh.
in a decrease in this amino acid, coinciding with the The transcriptomics, proteomics and metabolomics studies
onset of cyanidin 3-galactoside accumulation. Isoleu- in Malus, although very much at an early stage compared with
cine, a precursor of many of the volatiles in apple fruit, those of a number of other crop and model plant species, are
followed a similar accumulation pattern to that of phenylala- beginning to help to unravel the genetic and environmental
nine. Increasing the duration of exposure to light resulted in basis of control of some key traits. Already these studies have
higher concentrations of the ethylene precursors, S- made significant contributions to the understanding of fruit
adenosylmethionine and 1-aminocyclopropane-1 carboxylic development and ripening, particularly in relation to the aroma,
acid, as well as a number of the organic acids and sugars, nutritional and flavour characteristics of specific apple culti-
although for some of these metabolites the changes occurred vars, and the response to pathogen infection and abiotic stress.
sometime after light exposure, during fruit storage. Light Using these -omics-based approaches to gain knowledge of the
exposure was also shown to decrease the amounts of malic, interrelationships between the Malus genome, transcriptome,
citramalic, galacturonic and mucic acid, and as these latter two proteome and metabolome will greatly enhance the under-
pectic acids are products of cell wall degradation, it was standing of biological processes and traits, and ultimately assist
suggested that light exposure might delay the softening pro- breeders to better exploit apple germplasm resources for new
cess. Rudell and Mattheis (2009) then analysed apple peel variety development.
metabolites in relation to the storage disorder superficial scald.
Scald-susceptible Granny Smith fruit were irradiated post-
harvest, followed by a period of storage and an assessment of Application to breeding
the light-treated area for scald severity. No scald was evident
in areas of fruit exposed to light for at least 4 h, while areas of Exploring, cataloguing and exploiting naturally occurring
the same fruit not exposed to light also showed reduced scald. genetic variation in apple
Metabolite profiling revealed enhanced concentrations of phe-
nylpropenoids in both light-treated and non-light treated por- For most traits, ample genetic variation is available in apple,
tions of the same fruit, although idaein and chlorogenic acid both within the cultivated germplasm and the closely related
were most elevated in exposed peel, whereas quercetin glyco- ancestor Malus sieversii. Additionally, the entire Malus
sides and ()epicatechin were most elevated in unexposed genus (including approximately 55 species, Harris et al.
peel. A relationship was observed between enhanced phenyl- 2002) is a reservoir of genetic variation thanks to the pres-
propanoid content and reduced scald development in unex- ence of potentially positive traits (e.g. resistance to biotic
posed peel, particularly with respect to reynoutrin, avicularin and abiotic stresses) in species which are, for the most part,
and epicatechin. Peel exposed to light exhibited reduced cross-compatible with the cultivated species.
amounts of alpha-farnesene and its oxidation product 2,6,10- QTL analysis enables us to explore the genetic control of
trimethyldodeca-2,7(E),9(E),11-tetraen-6-ol. A greater in- traits of agronomic interest and map the corresponding loci,
crease in the amount of the isoprenoid antioxidant - which then become available for marker-assisted breeding
tocopherol was observed in unexposed peel than exposed, (Holland 2007). QTL analysis is usually carried out follow-
and these elevated concentrations correlated with reduced ing one of two general strategies, namely linkage mapping
scald development. based on bi-parental populations or association mapping
Advances in technology now allow not only quantitative (also known as LD-based mapping). Linkage mapping relies
profiling of metabolites extracted from a particular tissue, on establishing a relationship between markers and target
but also representational imaging of the distribution of a locus within experimental populations, while association
specific metabolite within a particular tissue. Zhang et al. mapping tests (at either candidate gene or genome-wide
(2007) used apple fruit in developing colloidal graphite as a level) the correlation between genotypes and phenotypes
matrix for laser desorption/ionization mass spectrometry using supposedly unrelated individuals.
In apple, QTL analysis appears in its infancy, as a approach was utilized by Khan et al. (2007) and by Stoeckli
literature search (February 2012) with apple and QTL et al. (2009) to trace back in pedigrees the origin of marker
as keywords retrieved 36 peer-reviewed linkage-based alleles/haplotypes putatively linked to fire blight and rust
QTL studies only, whereas such studies are in the range mite resistance loci, respectively. In the first case, the culti-
of several hundreds in crop species such as wheat, rice or var Coxs Orange Pippin was recognized as the founder
maize (not shown). Apple genetic studies have so far been donor of the relevant QTL haplotype, and other pedigree-
based on approximately 20 mapping populations, which related cultivars carrying the same haplotype also showed a
occasionally share one parent and/or are pedigree-related. high degree of fire blight resistance. In the second example,
Additionally, only in a few instances was the same trait the same approach failed to identify a marker-trait associa-
investigated in multiple genetic backgrounds, with the tion across pedigrees, because of lack of information on rust
most targeted traits being fire blight resistance (seven mite resistance of the founder allele donor. Additionally, the
studies, eight genetic backgrounds), fruit quality (five marker-resistance association was not confirmed on a larger
studies, four genetic backgrounds), scab resistance (four sample of accessions.
studies and genetic backgrounds) and plant architecture Of course, the limited QTL correspondence among stud-
(four studies and genetic backgrounds). An association ies even when the same QTL alleles are in play, is by no
mapping study based on the candidate gene approach means unexpected, as QTL effects are by definition strongly
was carried out at two MADS genes, which were found to be modulated by mostly unknown environmental factors and
associated with fruit flesh firmness (Cevik et al. 2010). by epistatic interactions (GE and GG, respectively). The
It is therefore obvious that only an extremely small comprehensive integration of GE and GG effects into
portion of the apple genetic variability has been utilized formal QTL results is just beginning to be developed (Cooper
for quantitative trait dissection, and that the QTLs (and the et al. 2009). In apple, examples of QTL digenic interactions
potentially useful alleles) identified so far represent a small involved in fire blight resistance and of multiple external
subset of those potentially available for marker-assisted factors influencing the stability of QTLs for powdery mildew
selection (MAS). resistance were presented by Calenge et al. (2005) and
Calenge and Durel (2006).
Transferability of QTL data These preliminary results underline the complexity of the
genetic control of quantitative traits and the need to validate
As also observed in other species, the correspondence among experimentally any marker-based phenotypic prediction
apple QTLs identified in different studies and genetic back- originating from QTL studies.
grounds is limited. For instance, Kenis et al. (2008) searched
for transferability of fruit quality (including fruit weight, sol- The promise of genome-wide association (GWA) mapping
uble solids content (measured as Brix), acidity, stiffness and in apple
others) QTLs identified, based upon three different studies and
mapping populations, TelamonBraeburn (Kenis et The rapid progress of genotyping technologies and the un-
al. 2008), FiestaDiscovery (Liebhard et al. 2003) derstanding of the species molecular diversity will soon
and PrimaFiesta (King et al. 2000, 2001). Of the 45 make association mapping feasible in apple, both at the
QTLs identified in TelamonBraeburn, only nine QTLs candidate gene/region and at GWA levels, with the latter
were identified in a second population. Of course, actual QTL much more appealing, as it does not rely on assumptions
transferability (i.e. the observed genetic effect due to the about gene functions. High SNP frequency is observed
segregation of same or different alleles at the same locus) within cultivated apple (1 SNP/227 bp in Golden Deli-
can only be approximated, because of the paucity of common cious, or 1 SNP/52 bp across apple germplasm) (Velasco
markers between studies, differences in map informativeness et al. 2010; Micheletti et al. 2010). While the Golden
and analytical approaches, and the statistical nature of the Delicious genome sequence already provides >106 SNPs
results. With these limitations in mind, the approach known (Velasco et al. 2010), an essentially unlimited number of
as metaQTL analysis (Goffinet and Gerber 2000; Veyrieras SNPs has been available by other cultivar re-sequencing
et al. 2007), which more formally tests whether QTLs from efforts. Indeed, approximately two million SNPs were re-
different experiments are the result of segregation at the same cently detected using a set of 27 apple accessions re-
locus, could be utilized in the future. sequenced using the Illumina Genome Analyser II and
A step further in evaluating the transferability of QTL aligned to the assembly of heterozygous Golden Delicious.
results between genetic backgrounds and therefore the po- These new SNPs have been used to develop the Internation-
tential usefulness of the favorable QTL alleles for MAS, is al RosBREED SNP Consortium (IRSC) 8 K SNP array v1
the molecular typing of the relevant QTL alleles from the to genotype apple seedlings with dense panels of SNPs, in a
initial QTL mapping experiment into related pedigrees. This single reaction using the Infinium II assay (Chagn et al.
2012). The identification of SNPs from a diverse source of (Janick et al. 1996), it is easy to predict that a large portion
accessions will limit the risk of partial informativeness and of apple molecular diversity is already within reach.
bias, which could have been encountered if Golden Delicious
SNPs only were used. High-throughput genotyping and genomic selection
Although the extent of linkage disequilibrium (LD) is
still incompletely known, preliminary information and esti- Reliable knowledge is already available in apple in order
mates from species of comparable diversity and domestica- to enable marker-assisted breeding at targeted loci con-
tion history (e.g. grape, Myles et al. 2010) indicate that it trolling pathogen resistance and fruit quality traits (see,
will probably be in the range of 10 kb or less (D. Micheletti for instance, Patocchi et al. 2009). Increasing marker
et al. 2010), providing association mapping with a resolu- density around loci defining traits that influence impor-
tion power near the size of a single gene space. tant breeding goals will improve efficiency of apple
The correct interpretation of the results of association cultivar development greatly by expanding the develop-
mapping tests, both at the genome-wide and at the candidate ment of marker sets for application of MAS for an
gene levels, should consider spurious associations caused by extensive range of input and output traits. Alternative
hidden population structure (Rafalski 2010). Such informa- sets of markers for MAS applications in different breed-
tion is lacking in cultivated apple, as genetic diversity stud- ing populations have also begun to be assembled.
ies based on different multiple-marker systems have so far A number of apple breeding programs (e.g. Durel et al.
being limited to relatively small or local collections and 1998; Kumar et al. 2010) are currently using best linear
have failed to show any consistent population structure in unbiased prediction (BLUP) of breeding values (BV) of
several different studies (see, for instance, Hokanson et al. individuals in order to make selections of next-generation
1998; Garkava-Gustavsson et al. 2008). This could be an parents or potential cultivars for further testing. Genome-
indication of underpowered molecular investigations (both wide selection (GWS), an alternative to phenotypic selec-
in terms of number of markers and accessions), of signifi- tion, involves estimation of the effects of chromosome seg-
cant differences in the accessions sampled between studies, ments (e.g. SNP alleles) in a training population (for which
or both. The application of the high density SNP-based there are both genotypic as well as phenotypic data), fol-
genotyping technologies for association mapping will soon lowed by prediction of genomic estimated BVs (GEBVs)
clarify this issue. for individuals in the selection population (that only have
To avoid the constraints of both biparental mapping genotypic data; Meuwissen et al. 2001). Genomic selection
(reduced number of alleles in play and low mapping reso- exploits linkage disequilibriumhence the marker density
lution) and association mapping (hidden population struc- must be sufficiently high enough to ensure that the majority
ture), approaches requiring the production of new multi- (if not all) QTL are in LD with a marker allele or haplotype.
meiosis and multi-parental types of experimental popula- Studies using the IRSC 8 K SNP array v1 have indicated
tions have been proposed (Churchill et al. 2004; Yu et al. that the array is effective for use in genetic mapping
2008), which could be applied, at least conceptually, in (Chagn et al. 2012) and GWS (Kumar et al. 2011, 2012),
apple genetics. An additional approach that has been pro- although the resolution will be insufficient for association
posed already for genetic mapping in apple exploits the studies with unrelated germplasm. Genomic selection is a
identity of chromosome regions (derived by combining practical reality for apple breeders, thanks to cost-effective
marker and pedigree information) for individuals from dif- high-throughput genotyping services.
ferent experimental populations or breeding programmes, in Genomic selection will speed up the introgression of
order to improve the informativeness and the statistical novel (e.g. durable disease resistance, flavour and flesh
power of marker/trait association process (Bink et al. 2008). colour) traits into commercial cultivars, thus enhancing
In the next few years, the boundary between high-density genetic gain per unit time. There are only few published
SNP genotyping and genome re-sequencing will reduce studies (e.g., Bus et al. 2009; Volz et al. 2009) that have
until it vanishes. Genome scans with a hundred thousand demonstrated successful application of genomic selection
SNPs would be equivalent to genome full-resequencing, for introgression of disease resistance in apple. Now that
because the density of the SNPs would be higher than the the availability of SNPs and the high-throughput geno-
LD decline, at least for the purposes of mapping genes or typing technologies are no longer a barrier, the subsequent
QTLs. At the same time, the cost and the hurdles of genome challenge is overcoming the phenotyping bottleneck that
sequencing are decreasing so quickly that re-sequencing limits our ability to capitalize on sequencing and geno-
whole germplasm collections, or individuals from mapping typing information. The phenotyping of large populations
populations, will soon be feasible (Weigel and Mott 2009; will enable a better understanding of integrated plant
Huang et al. 2009). Given that the number of apple cultivars performance, and lead to the achievement of fast targeted
present in worldwide collections is in the range of thousands breeding.
Genome sequence and gene mapping: casting light markers derived from Golden Delicious, transformed into
on the dark matter in apple linkage maps high resolution melting (HRM) markers (Liew et al. 2004;
Chagn et al. 2008), the populations segregating were
As the Golden Delicious genome sequence has been gen- screened for the woolly apple aphid resistance and seven
erated and assembled using genetic maps to anchor large SNP markers that were polymorphic for MIS-OP were lo-
metacontigs, researchers now have a resource of skeleton cated on the map (Fig. 1). This allowed the positioning of a
reference maps anchored to the genome, which give access new marker 1.6 cM distal from Er4, which in combination
to the genome sequence between markers. An analogy can with the proximal SCAR marker previously developed,
be made between the genetic maps before the availability of gave a more efficient set of markers for MAS. Such exam-
the genome sequence and maps of stars, so that gaps in a ples can be taken further, and it will be a relatively simple
genetic map between markers can be compared to the dark task to develop further markers in the interval between the
matter (such analogy is used for a different issue by human SCAR marker and the closest SNP.
geneticists). The genome sequence will be used by apple
researchers to develop an understanding of what makes up Toward the identification of candidate genes and causative
this dark matter. This includes exploiting the Golden Deli- mutations
cious genome for applications such as: increasing the den-
sity of genetic maps, speedier identification of positional The closest Golden Delicious SNP (GDSNP02236) from
candidate genes, as reference for whole-genome sequencing Er4 is estimated to be located 1,505 kb from the top of LG 7
in the apple germplasm, detection of sequence polymor- genome assembly. Although the SCAR marker mapped
phisms including SNPs and copy-number variants (CNVs), above Er4 cannot be identified in the Golden Delicious
as well as development of the large genotyping platforms assembly, it is likely that Er4 lies within the relatively small
needed for GWA studies to mine useful traits from the wider physical region between the top of LG 7 and 1,505 kb. A
apple germplasm and for GWS in apple breeding programs. bioinformatics search for non-redundant cDNA giving sig-
nificant BLAST hits to plant proteins using Bioview
Denser genetic maps (Crowhurst et al. 2006) returned 251 hits within this region,
including 10 putative resistance gene analogs. Such genes
The apple pre-genomic era was highly productive in gen- are potential candidates from which further markers could
erating linkage maps in a range of genetic backgrounds, be designed using partial assemblies constructed from MIS-
with all breeding programmes worldwide developing maps OP sequence, in order to move closer to Er4. The identity of
for their own favourite breeding founders. A range of mark- the scab resistance gene (Vg) mapping to LG 12 of Golden
er systems was used, with SSR being the most used marker Delicious (Durel et al. 1999; Calenge et al. 2004) is cur-
system because of its transferability across genetic back- rently being investigated by M. Malnoy (IASMA), C-E.
grounds. However, these maps were all of medium to low Durel (INRA), and their colleagues, using the whole ge-
density, with at most a few hundred markers distributed over nome sequence of Golden Delicious. Such examples dem-
the apple genome and harboured regions of uncharacterised onstrate that candidate gene identification is now a much
dark matter between markers that are several cM apart. faster process than it was previously.
The SNPs recently developed and validated from Golden Similarly, an initial bioinformatics screening within QTL
Delicious ( intervals is a useful method for identification of candidate
apple/) are the first source of markers from the genome genes. This may be performed by searching lists of func-
sequence that can be used to increase genetic map density. tional candidate genes or by examining the list of putative
An example of use of such markers is increasing the density ORFs within the QTL interval. Subsequent finemapping
of markers flanking the major gene for woolly apple aphid and LD studies can help to narrow down the intervals, until
resistance Er4 gene (Bus et al. 2010). A genetic map of the the alleles (or haplotypes) tightly linked to, or even control-
open pollinated Mildew Immune Selection OP 93.051 G02- ling the character are identified.
054 (MIS-OP) carrying the resistance locus had been con-
structed in the pre-genomic era, spanning 59.1 cM of LG 7 Structural genome variants and CNVs
and using six markers. The closest markers, a sequence
characterized amplified region (SCAR) and a single SSR, The analysis of the human genome sequence and the recent
are located 1.8 cM and 9 cM away from the resistance locus, large SNP assays carried out for the human HapMap and
respectively. The distance of the distal marker from the Wellcome Trust CaseControl Consortium initiatives showed
locus did not give a fully efficient set of markers for MAS that the human genome contains large number of structural
by breeders (Gardiner et al. 2007), making it necessary to variants such as CNVs (Iafrate et al. 2004; Sebat et al. 2004). It
develop closer markers flanking the gene. Using 27 SNP is known that CNVs can account for phenotypic characters in
Fig. 1 An example of how MIS OP LG 7 MIS OP LG 7 'Golden Delicious' LG 7

Golden Delicious SNPs can
be used to increase genetic map
density. A region of LG 7 of the CHSd1
genetic map of the open
pollinated Mildew Immune
Selection OP 93.051 G02-054
(MIS-OP) carrying the 0.0 GDSNP00876
1.1 GDSNP01172
resistance locus to woolly apple 4.4 GDSNP02236
aphid constructed in the pre- 14.2 Hi07d12a 0.0 NZscA4F3R3 5.2 GDSNP01994
genomic era(Bus et al. 2010) is 16.1 NZscA4F3R3 2.5 Er4 5.8 GDSNP01946
17.9 Er4 4.1 GDSNP01994 9.1 GDSNP00802
shown on the left, and enriched 5.0 GDSNP02236 11.3 GDSNP01717
with SNP markers derived from 6.3 Hi07d12a 17.8 GDSNP00511
Golden Delicious in the 7.7 GDSNP00802 19.2 GDSNP02246
centre. Golden Delicious LG 26.9 CH04e05
9.5 GDSNP01780 21.4 GDSNP01988
10.5 GDSNP00511 22.5 GDSNP01872
7 is on the right GDSNP00224
10.7 23.1 GDSNP02853
GDSNP02330 GDSNP00224
11.9 GDSNP01725 GDSNP00590
15.1 GDSNP00590 23.6 GDSNP01780
20.5 CH04e05 GDSNP02330
24.2 GDSNP01725
41.6 EMPc111a 28.1 GDSNP01074
29.8 GDSNP01882
31.4 GDSNP00247
32.0 GDSNP00049
33.1 CH04e05
35.4 GDSNP00084
37.7 GDSNP02156
40.0 GDSNP00530
59.1 Hi05b09 45.9 GDSNP00699
47.3 GDSNP02291
50.0 GDSNP01040

60.4 GDSNP00256

humans (Freeman et al. 2006; Beckmann et al. 2007; Sebat et Bioinformatic resources for the apple genome
al. 2007) and it is likely that CNVs are very frequent in plant
species and the full genome sequence for a range of accessions There are three public points of access to apple genome
will enable investigation of their role in apple. data: IASMA/FEM, GDR Db and NCBI. In addition, apple
The challenges of the future mean that apple geneticists genome data are integrated into the PLAZA 2.0 comparative
will need to be able to deal with large genetic datasets, for genomics platform.
applications such as genetic map construction and QTL
detection, LD mapping and GWS. The Virtual Institute of GBrowse at IASMA/FEM and GDR Db
Statistical Genetics (VISG; is devel-
oping new advanced statistical methods to address analyti- The apple genome can be explored using the open source
cal challenges such as accurate trait prediction, using Generic Genome Browser (GBrowse) genome viewer (http://
genome-wide marker panels in humans, livestock and plants at the IASMA/FEM website (http://
(including apple) as well as experimental design for marker/, and at the
trait association studies and genome-wide CNV detection. Genome Database for Rosaceae (
The following describes the GBrowse viewer at IASMA/
FEM at the time of writing.
Database and information resource GBrowse enables the viewing of apple genome data
using a wide range of starting points, from chromosome/
Malus EST datasets scaffold coordinates, to the gene or contig name, as well
as gene annotation terms (Fig. 2). Once a user has iden-
As of October 2010, GenBank dbEST has 336,358 Malus tified a region of interest, data can be downloaded for in
entries mostly derived from sequencing of cDNA libraries house use; however, currently this feature is limited to
(Newcomb et al. 2006; Gasic et al. 2009) derived from a contigs and gene sequences because of the current status
range of genotypes, tissues and developmental stages of the apple genome assembly (Velasco et al. 2010), for
(reviewed by Allan et al. 2009). which it is not possible to represent a chromosome or a
Fig. 2 Representative view of apple data in GBrowse. The apple Gene predictions are divided into two groups: Gene predictions are
scaffold is indicated as Cluster (Meta-Contig) and is depicted in blue; indicated as Gene Set (green) and Other Predictions (yellow); the
genome contigs belonging to this scaffold are shown as red arrows. distinction comes from the consensus approach applied in the apple
Scaffolds and contig coordinates refer to the chromosome (top ruler). gene prediction phase (Velasco et al. 2010)

scaffold as a single consensus sequence. It should be noted are considered reliable, as they are confirmed by differ-
that genes are divided in two categories in the viewer: gene ent predictors or experimental data, while other predictions
set and other predictions. Gene set predictions in this set are predicted by just one software or without agreement

Fig. 3 Schema of differences between the GBrowse and NCBI apple belong to the same scaffolds, relative positions are estimated from
genome visualization. Contigs are depicted as thick red arrows while clone sizes. b NCBI: only contigs with clear relations (e.g., a pair
reads are represented as thin coloured arrows; paired end reads are end read) belong to the same scaffolds; the whole assembly is separated
linked by a grey line. a GBrowse: all contigs that have common reads into Primary Assembly and Alternative Haplotypes
524 Tree Genetics & Genomes (2012) 8:509529

among predictors (details in Supplementary Material of and they can

Velasco et al. 2010). also be reached using the search tools available at NCBI
The IASMA/FEM website also provides access to two website (contig genome sequences: IDs from ACYM01000001
genetic maps of apple: Golden DeliciousScarlet con- to ACYM01144621; scaffolds of primary assembly: IDs
structed using 270 individuals and the integrated genetic from GL542920 to GL544586; and scaffolds arranged in
map derived from six F1 populations totalling 720 individuals chromosome: IDs from CM001026 to CM001042). The
( data are represented in a slightly different manner from
CMap software (Youens-Clark et al. 2009) shows genetic map those at the FEM/IASMA and GDR GBrowse sites.
information, starting from a genetic position or a genetic These differences were required to meet format stand-
marker name. Available genetic maps are linked to the scaf- ards of the NCBI database, whose rules of genomic data
fold assembly of the apple genome, so it is possible to move representation were developed for homozygous genomes
from a genetic position to the corresponding genomic se- and, among other mandatory features, they do not allow
quence. CMap also allows arbitrary sets of maps to be aligned overlaps among sequences that belong to the same scaf-
in order to show the relationships among them. fold or chromosome. However, currently many apple
scaffolds do contain overlapping regions, resulting from
NCBI Data the impossibility of distinguishing among different hap-
lotypes with the coverage reached during the sequencing
At the NCBI web site ( phase of the project (see Supplementray Material of
genomeprj/28845), 14,462 apple genome contigs are avail- Velasco et al. 2010). In order to represent those regions
able as a collection (master WGS) and the genome in the NCBI system, sequences of an original scaffold
assembly is represented by the NCBI standard AGP are separated, giving different versions of the same
format. Genome data are linked to the project home page region. Therefore, the apple assembly at the NCBI is

Table 1 Summary of genomes and genome features available at the PLAZA 2.0 platform (

Species No. of genes Coding TE GO InterPro Genes in non-singleton gf Genes in multi-species gf

Lotus japonicus 69,647 43,146 26,456 19,770 24,738 25,716 25,174

Medicago truncatula 57,587 45,197 11,614 17,836 21,999 38,494 30,844
Glycine max 46,509 46,464 0 32,616 38,517 45,982 45,652
Malus x domestica 95,230 63,546 31,684 47,889 44,063 58,790 55,288
Manihot esculenta 30,800 30,748 0 20,434 24,222 30,132 29,965
Ricinus communis 31,221 31,221 0 16,901 20,285 24,455 23,315
Populus trichocarpa 41,521 41,476 0 25,633 30,585 37,777 36,847
Arabidopsis thaliana 33,602 27,416 3,903 22,874 21,467 26,118 25,932
Arabidopsis lyrata 32,670 32,670 0 21,429 23,557 30,870 29,264
Carica papaya 28,072 28,027 0 13,914 16,265 22,531 21,005
Vitis vinifera 26,644 26,504 0 20,244 19,035 23,268 22,682
Oryza sativa ssp. japonica 57,874 42,211 15,571 26,014 24,735 37,391 37,020
Oryza sativa ssp. indica 59,430 49,202 10,189 26,518 29,345 44,310 43,609
Brachypodium distachyon 26,678 26,632 0 17,805 20,832 25,687 25,256
Sorghum bicolor 34,686 34,609 0 21,502 24,601 31,921 31,384
Zea mays 39,597 39,190 84,000 21,926 25,700 35,221 32,528
Selaginella moellendorffii 22,285 22,285 0 12,417 15,995 17,392 13,783
Physcomitrella patens 36,137 28,097 7,968 14,283 16,275 21,287 17,787
Ostreococcus lucimarinus 7,805 7,805 0 4,737 5,807 7,408 7,340
Ostreococcus tauri 8,116 7,994 0 3,966 5,094 6,797 6,663
Micromonas sp. RCC299 10,276 10,204 0 5,953 7,120 8,144 7,872
Volvox carteri 15,544 15,544 0 6,082 8,410 13,782 12,041
Chlamydomonas reinhardtii 16,841 16,788 0 8,509 8,973 13,666 11,796
Total 828,772 716,976 191,385 429,252 477,620 627,139 593,047

TE transposable elements, GO gene onthology terms, gf gene families

represented by the so called primary assembly and and Fragaria, have been fully sequenced and assembled, and
three alternative haplotypes. On average, NCBI scaffolds sequencing of several other genera of the family is planned,
of primary assembly and first alternative haplotypes account the way is wide open for development of the field of com-
for the 70 % of the sequences of the original scaffold, 18 % are parative genomics across the Rosaceae. This covers the
more or less in the second alternative haplotype, and the analysis of characteristics of whole genomes, at both the
remaining 10 % are in the third alternative haplotype sequence and gene levels, in contrast to the analysis of
(Fig. 3). Although the assembly was split in four, in order to single genes in previous comparative mapping studies. Nev-
be made public at the NCBI, it should be stressed that an ertheless, comparative mapping between closely related spe-
alternative haplotype does not represent a real haplotype, cies of the Rosaceae family has given useful insight on the
despite its name. It is an artefact needed to represent the degree of synteny conservation in the family. The genomes
current apple genome assembly at the NCBI database and it of species belonging to the genus Prunus were shown to be
does not have the biological meaning commonly associated essentially collinear using SSR and gene-derived markers
with this name. (Dondini et al. 2007; Olmstead et al. 2008). Similarly, in the
Pyrinae tribe, the genomes of Malus, Pyrus and Eryobotrya
PLAZA 2.0 comparative genomic platform have been aligned and were shown to be highly collinear
(Celton et al. 2009a). Using a modest number of transferable
The PLAZA resource enables the comparison of apple ge- markers, Dirlewanger et al. (2004) compared the genomes
nome data with all other publically available genomes that of Malus and Prunus and found strong evidence that single
exploit the bioinformatic tools at this website, by integrating LGs in the diploid Prunus were homologous to two distinct
structural and functional annotation of published plant homeologous LGs of Malus. A comparative mapping study
genomes, and providing a large set of interactive tools to between Prunus and Fragaria showed how these genomes,
study gene function and gene and genome evolution (Proost still conserving synteny, have undergone genome reshuffling
et al. 2009). The second release of the PLAZA platform since their divergence from a common ancestor (Vilanova et
( integrates struc- al. 2008. More recently, a more comprehensive analysis of
tural and functional annotation from 23 plants comprising synteny conservation in the Rosaceae family has been carried
11 dicots (including apple), 5 monocots, 2 (club-) mosses using a large number of conserved orthologous set (COS)
and 5 algae (Table 1). Pre-computed data sets cover homol-
ogous gene families, multiple sequence alignments, phylo- 6
genetic trees, intraspecies whole-genome dot plots, and 4 17
genomic colinearity between species. There are more than 12
840,000 genes held, and of the protein-encoding genes, 87 % TxE5
are stored in 32,332 multi-gene families (70 % covering
homologs from multiple species). TxE7
14 V VI
For apple, 57,386 of the 63,546 gene predictions (pro- 6 TxE4 10 5
tein-coding genes + putative transposable elements) are II
assigned to a multi-gene family; 54,479 of these genes are V TxE8
part of gene families that include homologs in other species.
54,250 genes have InterPro annotation, while 61,945 genes 1
have a Gene Ontology annotation (57,255 when excluding TxE2 TxE6
GO projection; II
I IV 11
wiki-plaza/construction#annotation). For all protein encoding 7 II 3
genes, large scale colinearity within apple and between TxE1
other species has been computed using i-ADHoRe 3.0 TxE7
(Vandepoele et al. 2002) and can be explored using the
Whole Genome Dotplot; while for a set of homologous 8 13
genes the local gene organization can be browsed using the 16
interactive Synteny Plot. 8

Fig. 4 Alignment of Fragaria (inner circle in red) and Prunus (inner

circle in green) with the two Malus homologous chromosomes (outer
Conclusions circles in orange). The identification of regions of conserved synteny
between the three genomes was enabled by in silico mapping of
conserved orthologous markers genetically mapped in Prunus and
Now that the genomes of three of the most economically Fragaria (Cabrera et al. 2009; Sargent et al. 2007; Vilanova et al.
important genera within the Rosaceae family, Malus, Prunus 2008) on the M. domestica genome sequence
526 Tree Genetics & Genomes (2012) 8:509529

