Sei sulla pagina 1di 38

www.sciencemag.

org/cgi/content/full/1176620/DC1

Supporting Online Material for


Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx)
Qingyou Xia, Yiran Guo, Ze Zhang, Dong Li, Zhaoling Xuan, Zhuo Li, Fangyin Dai, Yingrui Li, Daojun Cheng, Ruiqiang Li, Tingcai Cheng, Tao Jiang, Celine Becquet, Xun Xu, Chun Liu, Xingfu Zha, Wei Fan, Ying Lin, Yihong Shen, Lan Jiang, Jeffrey Jensen, Ines Hellmann, Si Tang, Ping Zhao, Hanfu Xu, Chang Yu, Guojie Zhang, Jun Li, Jianjun Cao, Shiping Liu, Ningjia He, Yan Zhou, Hui Liu, Jing Zhao, Chen Ye, Zhouhe Du, Guoqing Pan, Aichun Zhao, Haojing Shao, Wei Zeng, Ping Wu, Chunfeng Li, Minhui Pan, Jingjing Li, Xuyang Yin, Dawei Li, Juan Wang, Huisong Zheng, Wen Wang, Xiuqing Zhang, Songgang Li, Huanming Yang, Cheng Lu, Rasmus Nielsen, Zeyang Zhou, Jian Wang, Zhonghuai Xiang,* Jun Wang*
*To whom correspondence should be addressed. E-mail: xbxzh@swu.edu.cn (Z.X.); wangj@genomics.org.cn (J.W.) Published 27 August 2009 on Science Express DOI: 10.1126/science.1176620 This PDF file includes: Materials and Methods SOM Text Figs. S1 to S7 Tables S1 to S10 References

Materials and Methods Sample collection In order to include major silkworm systems kept in the laboratories worldwide, we collected strains from diverse geographic regions, such as China, Japan, Europe and tropical areas (mostly southeast Asian: India, Cambodia and Laos), as well as silkworms from the mutant system. All 29 domesticated samples listed in Table S1 are from the Institute of Sericulture and Systems Biology in Southwest University of China. Two important developmental characteristics, voltinism (number of generations per year) and moltinism (number of larval molts per generation), and sex were recorded for each of those 29 domesticated silkworms. Of these, 18 are monovoltine, 8 are bivoltine and others are polyvoltine. We also captured 11 wild silkworms from mulberry fields in China, facilitating the comparative analysis between domesticated and wild groups. An advantage of the domesticated silkworm over other lepidopteran species is that many mutations (morphological, biochemical, and behavioral mutations) and many inbred geographic strains (e.g., Chinese, Japanese, Korean, European, Tropical strains) are available and represent important resources for studying artificial selection and silkworm domestication. It is likely that farmers first moved wild silkworms from field to house so that they could be reared to produce silk in a predator free environment. Then high silk production traits and easy handling may have evolved by artificial selection. Sequentially, they were brought by human to different countries in the world through commercial trade. Finally, the domesticated silkworm underwent long-term rearing and breeding by local farmers, forming geographically different varieties with specific characteristics (such as voltinism and moltinism) affected by local climate. Currently, these geographic varieties are maintained in different stock centers and preserved by close inbreeding within each variety. Library construction and sequencing Genomic DNA was extracted from silkworm pupae and moths using a standard protocol for genomic DNA extraction. We only sequenced a single individual for each variety of both domesticated and wild silkworms. The manufacturers instructions (S1) were followed to prepare libraries. We used the workflow, as described (S2), to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking, and denaturization and hybridization of the sequencing primers. Then we applied a base-calling pipeline (SolexaPipeline0.3) (S1) to detect sequences from the raw fluorescent images. Public data used The silkworm reference genome sequence and annotation information were downloaded from the Silkworm Genome Database (S3, S4). We reconstructed silkworm chromosomes by joining genomic scaffolds with 500 bp Ns, according to their mapping relationship (S5). Unmapped scaffolds were joined by 500 bp Ns to form a chromosome UN. Because the insert size of paired-end (PE) libraries is less than 500 bp, we used gaps of 500 bp Ns to make the chromosomes for convenience of analysis. Additional sequences containing complete CDS were retrieved from NCBI as of Feb. 8th, 2009. We then made a non-redundant annotation file by comparing those two datasets. Microarray-base gene expression data for the domesticated silkworm, whose genome is taken as reference, was downloaded from BmMDB (S6). Reads mapping We used SOAP v1.09 (S7) to map raw single-end (SE) and PE reads onto the finished silkworm reference genome (S5). Reads were classified into three categories, uniquely aligned (those with unique alignment positions), repeatedly aligned (those that can be mapped to multiple genomic locations with the same least base differences; only one randomly chosen

chromosome position was reported) and unaligned reads. The same trimming strategy, as described (S2), was applied when dealing with mismatches. PCR duplications were removed by a PERL script which discards pairs with identical outer coordinates to improve the accuracy. SNP calling A four step procedure was utilized to detect SNPs. (1) We used SOAPSNP (S8) to calculate the likelihood of each individuals genotypes. (2) We integrated all the individual likelihood files together to produce a pseudo-genome for each site in the total sample of 40 genomes by maximum likelihood estimation (MLE). Sites passing criteria according to copy number, sequencing depth, quality score and minor allele count, were kept for the following rank sum test adjustment. SNPs passed the rank sum test (S2) (P>= 0.005) were fixed as members of the high quality (HQ) SNP set. (3) For domesticated strains as a whole, another pseudo-genome for domesticated group was made without filtering. Polymorphic positions overlapped with HQ SNPs were retained as SNPs for the domesticated silkworms. We took a similar process for the wild ones and obtained a SNP set of them. (4) We allocated base types back to each individual based on genotypes of HQ SNPs and each individual likelihood file. The genotype with the largest likelihood was directly chosen as the consensus genotype in each individual. Short indel detection A three-step approach was used to call indels. (1) For each individual, we conducted a second run of SOAP, allowing for gaps. Individual indel sets were obtained by a pipeline developed before (S2). (2) For each genomic position supported by at least one individual indel set, reads from all the samples were considered to pass the filtering criteria with number of supporting reads. The resulting indels was termed high quality indels. (3) Assigned indels back to each individual. From the high quality indel set, we picked indel sites in each individual with at least one supporting read. Experimental validation of SNPs and indels We used the Sequenom Genotyping Platform (S9) to validate HQ SNPs picked randomly according to their characteristics mentioned in the section SNP calling. We genotyped 4,840 sites in 121 SNP positions across all the 40 silkworms and confirmed that 117 were polymorphic. As a pilot phase for indel validation, we randomly selected 10 high quality indel positions and found these indels 69 times in the 40 samples. Then we performed PCR-Sanger dideoxy sequencing using AB 3730XL at those sites. After manually checking all the intensity trace files we found all the polymorphic positions were confirmed by the PCR-sequencing result. Detection of structural variations (SV) A three-step strategy was used to detect the SVs for 36 silkworms with PE sequencing reads. (1) SVs were called individually, as described (S2), and regions of at least 2 supporting abnormal read pairs were retained for the second step. (2) We treated all the PE reads from 36 silkworms as from a single individual and maintained for the next step potential candidates with (a) at least 10 abnormally mapped supporting pairs and (b) at least 2 qualified individuals each with at least 2 supporting pairs. The resulting SVs were termed as high quality SVs. (3) We assigned high quality SVs back to each individual. Individual SVs with 80% of its length overlapping with any high quality SV were reported. Calculation of Linkage Disequilibrium (LD) To measure LD level in the silkworm population, we calculated correlation coefficient (r2) of alleles after setting -maxdistance 200 -dprime -minGeno 0.6 -minMAF 0.1 -hwcutoff 0.001

by the software Haploview (S10). Then curves were plotted with R scripts which draw averaged r2 against pairwise marker distances. Domestication associated site (da-SNP, da-indel) detection Genomic polymorphic sites where at least 28 domesticated strains and at least 10 wild ones have unique reads, corresponding to a minimal concordance rate of 95%, were chosen to enter the 2 test for domestication association. Then a Bonferroni corrected P value of 2.9610-8 and 1.8710-6 was used to screen out significant da-SNPs and da-indels, respectively. Construction of silkworm phylogeny Individual SNPs generated after step (4) of the SNP calling section were used to calculate distances between silkworms. The p-distance between two individuals i and j is defined to be

Dij =

1 L (l ) dij , L l =1

where L is the length of regions where HQ SNPs can be identified, and given the alleles at position l are A/C, then

( dijl )

0, if genotypes of the two individuals are AA and AA, 0.5, if genotypes of the two individuals are AA and AC, = 0.5, if genotypes of the two individuals are AC and AC, 1, if genotypes of the two individuals are AA and CC.

Then a neighbor-joining method was used to construct the phylogenetic tree on the basis of the distance matrix calculated by the software PHYLIP 3.68 (S11). Bootstrap values were calculated in 1,000 replicates. PCA analysis Following the procedure of (S12), we considered only autosomal data with n=40 individuals, and ignoring sites with more than two alleles or missing data (S=14,056,247 SNPs). The genotype of individual i at SNP k was transformed to dik=0, 1 or 2 if individual i is homozygous for the reference allele, heterozygous, or homozygous for the non-reference allele, respectively. M is an nS matrix containing the normalized genotypes: dik=(dik-E(dk))/

E (dk ) (1- E (dk ) / 2) / 2 , where E(dk) is the mean of dk. An nn matrix of the sample
covariance of the individuals was calculated by X=MMT/S. The eigenvector decomposition of X was performed using the R function eigen and the significance of the eigenvectors was determined with a Tracey-Widom test implemented in the program twstats provided with the EIGENSOFT software (S12). We obtained the latitude and longitude of the capital of a province or country of origins with Google Earth program (for Europe we took the center define by Google Earth). Correlations between phenotypes and eigenvalues were tested with Kendalls statistics (S13). Population structure inference First, ped files were created as input for PLINK (S14, S15) with parameters --ped ped_file --recode12 --geno 0.5 --map output_map. Then the program frappe (S16, S17) was utilized to infer population structure and ancestry information of the silkworms. The analysis was based on 13,066,429 SNP sites and we did not assume any prior information about their ancestry. We run 10,000 iterations and pre-defined the number of cluster, K, from 2 to 9.

Population history model In order to understand the impact of the initial domestication event on observed levels of variation, we fit a simple bottleneck model to the data. The following parameters are assumed: domestication occurred 5,000 years ago, there is one generation per year, and there was a stepwise reduction in variation at the time of domestication. We here estimate both the severity of the population reduction, and the rate of population growth subsequent to that event. Two criteria are used to fit a bottleneck model. First, we use the empirically observed level of reduction, determined by the observation that the domesticated strains harbor ~83% of the variation observed in the wilds (with a ratio of 0.015/0.018). Second, we fit the estimated demographic model to the observed site frequency spectrum. In order to fit a model to both the observed level of reduction and the frequency spectrum, we take a simulation approach. Using the program ms (S18), a grid of parameter values were simulated, varying from a population size reduction at the time of domestication from 1% to 99%, and an exponential rate of growth ranging from no increase in population size, to a 1000-fold increase from the time of domestication to the present. Identification of Genomic Regions of Selective Signals (GROSS) A sliding window approach was applied to quantify the polymorphism levels (, pairwise nucleotide variation as a measure of variability) (S19), selection statistics (Tajimas D, a measure of selection in the genome) (S20) and genetic differentiation between domesticated and wild populations (Fst) (S21). Our analysis was performed for 5 Kb windows sliding in 500bp steps and SNPs for each population were from subsection (3) in the SNP calling. We developed a series of PERL scripts that consider genotype frequencies in the two groups and calculate values of , Tajimas D for both groups, Fst between the two populations following the formulas for those statistics (S19-21). Then we considered the distribution of PiR (defined to be the ratio of ,domesticated to ,wild), and the distribution of TDD (Tajimas D for domesticated silkworms). We used an empirical procedure and selected windows with significantly low PiR and significantly low TDD values (Z test, P<0.005 for both; Fig. 2A) as candidates of selection signals along the genome. Neighboring windows were joined where possible, forming larger regions (GROSS). Microarray analysis for genes in GROSS The microarray data of these genes in GROSS came from the Bombyx mori microarray database (S6). Hierarchical clustering of the data was performed with the program Cluster (S22), and the cluster data were visualized using the program TreeView (S22).

Supporting Text Data production We performed whole-genome resequencing for each silkworm varieties using the Illumina Genome Analyzer II (GA II) and produced 1.50 billion short reads (averaging 42 bp in length), which corresponds to 63.25 Gb raw data. In total, we obtained a 118.1 X effective depth for all 40 varieties, with an average depth of 3X for each variety (Table S2A). The mean genome coverage for domesticated and wild silkworms was 82.0% and 83.0%, respectively, and the mean gene region coverage was 91.8% and 94.2%, respectively. Mapping results for domesticated and wild strains are summarized in Table S2B. We observed ~5% higher of bases mapped for domesticated silkworms than for wild ones and ~0.6% lower mismatch rate for the domesticated strains, both of which can be due to the high genetic diversity between reference genome and wild strains. However a higher average sequencing depth for the wild ones compensate this difference and the resulting genome/gene region coverage are comparable between the two groups. Variation detection Making full use of the massive number of short reads provided by next-generation sequencing technology, the approach we took in this report can effectively cover around 80% of each individuals genome at a depth of 3X for the 432MB sequence. Guided by a pool to individual strategy (see Materials and Methods for details), we can detect high quality SNPs. Of the identified SNPs, 3,504,749 (21.9%) were within genes (introns and exons) and 422,815 (2.64%) were in the coding sequences (CDS) (Table S3A). We estimated that the ratio of synonymous to non-synonymous changes in the CDS was 2.91:1. We can also identify short indels (1-3 bp) as well as structural variations in a similar way. It would be difficult to confidently detect individual genomic variants at such a depth per individual, unless the population-level information of 118.1X coverage was taken into account. We found that only 1,433 (0.46%) of the indels are in the CDS, and 1,014 of these would cause a frameshift affecting 866 genes (Table S4A). For structural variation detection, we found a mean length of 560 bp, and genomic deletions comprise 98.8% of them, which can be explained by the limitation of short insert size. Mutation We calculated mutation rates for SNPs in different functional categories. We found that, for every functional class, the value for wild varieties is higher than for the domesticated ones (Table S3B). This observation is from calculating the estimate of the population mutation rate S (S23), which corrects for sample size in the two groups (29 domesticated vs. 11 wild). Accordingly, we also noticed a higher S value (Mann Whitney U, P=7.6910-6) for indels in wild silkworms compared to domesticated ones (Table S4B). In comparison to Gallus gallus, for which this information is available (S24), silkworms have a two fold higher level of S at CDS, intron and genome-wide levels (Table S3B). We also estimated values for SNPs in B. mori and found they are 0.0061, 00136 and 0.0136 for CDS, intronic regions and whole genome, respectively. values in B. mandarina are 0.0070, 0.0157 and 0.0153 for these three categories, respectively. Compared with Drosophila simulans (S25), all of these data are at a lower polymorphism level. Linkage disequilibrium (LD) pattern We assessed the linkage disequilibrium (LD) levels in the silkworm domesticated and wild varieties by calculating the pairwise LD measure r2 (S26, see Materials and Methods) and present curves representing LD decay with increasing genomic distance between SNP pairs (Fig. S1). We find that LD decays rapidly in silkworms, with r2 decreased to half of its maximum at a distance of around 46 bp and 7 bp for the domesticated and wild varieties, respectively. The faster

decay of LD in B. mori as compared to the decay of LD measurement in D. melanogaster [which also decreases rapidly to half of its maximum value at about several hundreds bp (S27)] is likely due to a higher recombination rate of 2.97 cM/Mb (S28) in the silkworm genome as compared to 1.59 cM/Mb (S29) for the fruitfly, as well as to high effective population sizes. The relatively slower decay of LD in the domesticated strains is most likely caused by inbreeding within each strain, although population structure, reduced effective population size, and a possible increased rate of positive selection may also have contributed. These results show that association mapping combining multiple domesticated strains is possible but can be confounded by the extensive population structure and inbreeding. By contrast, association mapping based on wild individuals will be difficult due to low levels of LD. As sample size is an important parameter influencing LD patterns, we randomly selected 11 domesticated silkworms to perform this analysis to adjust the sample size. For chromosome 2, we repeated the analyses for three independent sets of 11 randomly selected domesticated silkworms and found similar results. Demography of silkworms In the PCA analysis, there is a significant correlation with voltinism for the first four principle components in the domesticated varieties. Moltinism (number of larval molts per generation) also correlates with eigenvector 1 and 3. We observed a significant correlation between latitude of the sample origins and eigenvectors 2 and 4 (Kendalls , P=0.03 and 0.04, respectively) (Table S7), and a lack of connection between longitudes and any of the principle components. These key traits relating to silkworm biology and yield are defining genetically distinct subgroups, suggesting that genetic mapping of these traits may be complicated by the general genetic differentiation between strains with different molting and voltine values. Mapping studies may benefit from using varieties with large differences in the relevant moltism and voltinism traits, but with otherwise little genetic differentiation. After fitting the demographic model (see Materials and Methods), we observed that a 90% reduction in population in the domesticated variety could account for the observed levels of variability (Fig. S2). The surprisingly high levels of variability in the domesticated variety suggest that a large amount of individuals were used in the initial domestication event. An alternative hypothesis is substantial gene-flow between the wild and domesticated varieties after domestication, but the very clear differentiation between domesticated and wild varieties suggests that gene-flow from the wild to domesticated varieties may not have been strong. The distinct separation of strains does show that the genetic variation in the domestic strains has been maintained despite local inbreeding. It is commonly assumed that domestication leads to a significant reduction in variability (S30) because the domesticated species might have arisen from a geographically limited group of individuals and thus subjected to a bottleneck in population size during domestication, and they have been subjected to strong artificial selection subsequent to the domestication event. In many domesticated species [e.g., rice (S31) or wheat (S32)] the domesticated species contains much less variability at the nucleotide level than the corresponding wild species. We did not, however, find that these factors have been sufficiently strong enough in the silkworm to lead to extensive loss of genetic variability. We also inferred population ancestry with frappe (S16) and no ancestral information was assumed before the calculation. For K=2, the results show a clear domesticated/wild split (Fig. S3). This is consistent with the phylogeny and PCA results derived from our data. When K = 3, a new component including D5, D7, D15, D16 and D24 was separated from the entire domesticated group, also consistent with the same subgroup in the phylogenetic tree. From K = 3 to 4, another sub group emerged including D17-D23, D27 and D28, which clustered together in the phylogeny. When K = 5, the two Japanese high silk production strains stand out as a new group. At K = 6 or above, additional clusters came out as outlier populations which disturb previous organization of the population structure and make little biological sense.

Details of GROSS To determine if certain SNPs were more common in the domesticated strains, we adopted a complex trait association study methodology (S33). We treated domesticated and wild individuals as phenotypically distinct and conducted a series of association tests for each qualified SNP (Materials and Methods). In total, we found that 1,347 of the polymorphic sites were significantly different (Chi square; P<2.9610-8) in their association with domesticated versus wild varieties (termed domesticated associated SNPs, or da-SNPs), and that 410 (30.4%) of these lie within 298 genes (Table S8). Looking at the domesticated vs. wild variety association of the indels, we found that 34 indel sites were significantly different (Chi square; P<1.8710-6) in their association with domesticated versus wild varieties (termed domesticated associated indels, or da-indels). We found that more than 45% of all the da-SNPs are located in GROSS; this indicates that da-SNPs, which may be in the initial stages of becoming SNPs fixed in the domesticated group, are enriched in GROSS compared to genomic background. We found 212 GROSS contain only one gene, which means that approximately 60% (212/354) of all the genes (Table S9) found to be potentially important to domestication are unique to a GROSS (Table S10). This indicates that most GROSS genes were probably under selection by themselves, and had little chance to have experienced hitchhiking. Genes likely important for domestication are found in GROSS In addition to GROSS genes enriched in silk gland, we also found midgut- and testisenriched genes. While the former is related to metabolism of carbohydrates, amino acids and lipids, which play an important role in food digestion and nutrient absorption, the latter is annotated as having binding, catalytic, and motor activity related to reproduction. Among 32 midgut-enriched genes, nine participate in the dietary protein digestion (serine protease), carbohydrate metabolism (malate dehydrogenase and pyruvate dehydrogenase), substance transporting (organic cation transporter, sodium- and chloride- dependent glycine transporter 2, ATP-binding cassette transporter, and zinc transporter 5), and lipid metabolism [fatty acid binding protein (FABP) and scavenger receptor]. The malate dehydrogenase gene in B. mori shares 57% amino acid sequence similarity with its homolog in Escherichia coli, in which the mutant results in decreased activity of its encoded enzyme (S34). FABP is mainly involved in the binding and transport of unsaturated fatty acids, such as linolenic and linoleic acids, both of which are essential to silkworm and, like in other animals such as human (S35), can only be absorbed through food uptake (mulberry leaves for the silkworms). Artificial diet-based nutrition research has confirmed that there is a 60% of weight loss in silkworms fed on food without those two unsaturated fatty acids, compared to the ones in the control group (S36). The identification of these genes involved in energy metabolism indicates that the energy metabolism process has been under artificial selection in the process of silkworm domestication. Among 54 testis-enriched genes, five genes are involved in spermatogenesis: permidine synthase, sperm protein SSP411, t-complex-associated testis expressed 1, intersex, and shaggy. In addition, three genes are related to sperm motility: myosin class II heavy chain, outer dense fiber of sperm tails protein 2, and axonemal dynein intermediate chain inner arm i1. These results provide evidence for possible selective pressure on B. mori reproduction during the domestication process. Additional notes Genome-wide single base-pair level genetic variation maps have only been generated for species with small genomes, including yeast (S37), Salmonella (S38), Plasmodium falciparum (S39), and human rhinovirus (S40). For larger genomes, no comprehensive single-base resolution maps are currently available, although high-density SNP maps have been built for human (S41)

and mouse (S42), and moderate-density ones for chicken (S24), dog (S43), sheep (S44), and cattle (S45). Our strategy here provides a nearly complete genome level variation map, which gives more reliable information on genetic polymorphisms in a population. There are two sub-populations of B. mandarina, Chinese wild silkworms (from China, each with 28 chromosomes, the same as B. mori) and Japanese wild silkworms [from Japan, each with 27 chromosomes (S46)], and a common viewpoint of silkworm domestication (S47) states that the domesticated silkworms were tamed from the Chinese wild ones. Although this statement is the basis of our effort presented in this paper, mitochondrial results took advantage of these 40 samples and public data of the Japanese wild silkworm (NCBI Accession Number: NC_003395) does support compelling evidence of this argument (Li et al., personal communication). B. mori is not only well adapted to human handling, but is wholly dependent on humans for survival, in addition it is well-differentiated trait-wise from its wild cousin. Of equal importance, this event took place in a different geographical region (Asia vs. the Fertile Crescent) (S48) and in a distinctly different culture from the earliest known domestication events. These aspects make silkworm domestication a unique event in agricultural history, deserving the same kind of attention as the domestication of livestock and crop plants. We directly tested for selection related specifically to domestication by comparing variability in domesticated versus wild, and sorting out genomic regions with significant difference in polymorphism density between those two groups (e.g., Fig. S7). Although others are in the pipeline, it is unprecedented to have such a source of near-relatives in this clade for comparative genome analysis which can be aimed not only at identifying genes associated with domestication in the candidate GROSS we detected, but also for annotating and defining regulatory regions which can complement our knowledge about functional elements in the silkworm genome.

Supporting Figures

Fig. S1

Fig. S1. Linkage disequilibrium (LD) patterns. LD measured by r2 decays with pairwise marker distance suggesting a bottleneck at the time of domestication. The inset shows details of this trend for the first 100 bp. The maximum of r2 for domesticated and wild varieties, at the pairwise distance of 1 bp, are 0.829 and 0.733, respectively. When LD drops to half of the maximal levels, on average, SNP positions are 46 bp (r2domesticated=0.412) and 7 bp (r2wild=0.348) apart for the domesticated and wild varieties, respectively.

Fig. S2

Fig. S2. A bottleneck model estimation to illustrate silkworm domestication. Simulations showed that a 90% reduction in domesticated population size could account for the maintenance a ~83% variation of the wild varieties.

Fig. S3

Fig. S3. Population structure for the 40 silkworms. Number of ancestral populations, K, are set from 2 to 5 (top to bottom).

Fig. S4

Fig. S4. WEGO result: functional annotation for genes in GROSS.

Fig. S5

Fig. S5. A two-way hierarchical cluster analysis of the expression patterns of 159 GROSS genes in different Dazao tissues. Microarray signals for different tissue types (columns) and genes (rows) are shown, with continuous expression levels from dark green (lowest) to bright red (highest). A/MSG: anterior/middle silk gland; PSG: posterior silk gland.

Fig. S6

Fig. S6. Comparison of the relative expression of bHLH genes in the silk gland of fifth larvalinstar of the reference B. mori strain and a high silk production strain. The relative expression of bHLH genes was assessed by quantitative real-time polymerase chain reaction (qRT-PCR) analysis. BmActin gene was used as internal control and the highest relative quantities were set to 1. We found that bHLH is up-regulated four fold in the higher silk production strain compared to the reference strain on day 3 of the fifth larval instar.

Fig. S7

Fig. S7. An example GROSS containing only one gene Sgf-1 which is important to silk production. Density of polymorphism (), test statistics for selection (Tajimas D), diversity between two populations (Fst), and genome annotation are shown (from top to bottom). Both and Tajimas D for the domesticated and wild varieties are shown in red and green, respectively.

Supporting Tables

Table S1. Silkworm samples and detailed traits. Voltinism characterizes generation per year and moltinism denotes the number of larval molts per life cycle. (*: V1 represents monovoltine, V2 bivoltine and V3 polyvoltine. #: M2 represents bimoulting, M3 trimoulting, M4 tetramoulting and M5 pentamoulting.)
Sample ID D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 W01 W02 W03 W04 W05 W06 W07 W08 W09 W10 W11 Strain name J7532 J04-010 J872 J106 N4 Cambodia Lao India M3 Europe18 Italy16 Soviet Union No.1 15-010 02-210 15-001 Mutation M2 A06E Damao Ankang No.4 ZT500 Zhugui Bilian ZT900 ZT100 Sihong15 Xiaoshiwan C108 Sichuang M3 Qiansanmian Handan B. mandarina Ziyang B. mandarina Nanchong B. mandarina Hongya B. mandarina Pengshan B. mandarina Ankang B. mandarina Yichang B. mandarina Yancheng B. mandarina Luzhou B. mandarina Hunan B. mandarina Suzhou B. mandarina Rongchang Sex Male Female Unknown Male Female Male Female Male Female Female Female Unknown Female Male Unknown Unknown Unknown Male Female Female Female Female Female Male Female Female Female Male Male Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Voltinism* and moltinism# V2M4 V1M4 V2M4 V2M4 V2M4 V3M4 V3M4 V2M3 V1M4 V1M4 V1M4 V1M5 V1M4 V3M3 V2M2 V2M4 V1M3 V1M3 V1M3 V1M4 V1M4 V1M3 V1M3 V1M4 V1M4 V2M4 V1M3 V1M3 V1M4 Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown System or location Japan Japan Japan Japan Japan Cambodia Laos India Europe Italy, Europe Former SU, Europe Mutation Mutation Mutation Mutation Guangdong province, China Sichuan province, China Shanxi province, China Gansu province, China Zhejiang province, China Jiangsu province, China Sichuan province, China Hunan province, China Jiangsu province, China Zhejiang province, China Chongqing, China Sichuan province, China Guizhou province, China Hebei province, China Sichuan province, China Sichuan province, China Sichuan province, China Sichuan province, China Shanxi province, China Hubei province, China Jiangsu province, China Sichuan province, China Hunan province, China Jiangsu province, China Chongqing, China Other traits and comments Latitude Longitude 139.69 139.69 139.69 139.69 139.69 104.90 102.61 77.23 15.26 12.57 37.62 NA NA NA NA 113.26 104.08 108.95 103.75 120.15 118.77 104.08 112.98 118.77 120.15 106.55 104.08 106.73 114.48 104.08 104.08 104.08 104.08 112.57 114.29 118.77 104.08 112.98 118.77 106.55

High silk production, hybrid strain 35.69 35.69 High silk production, hybrid strain 35.69 35.69 35.69 11.54 17.97 28.64 54.53 41.87 55.76 NA NA NA NA 23.12 30.66 34.26 36.07 30.27 32.05 30.66 28.20 32.05 30.27 29.55 30.66 26.59 38.03 30.66 30.66 30.66 30.66 37.87 30.57 32.05 30.66 28.20 32.05 29.55

Table S2. Data production. (A) Sequencing summary.


Samples 29 Domesticated strains 11 Wild strains Total Yield (Gigabase) 44.2 19.1 63.3 % Genome coverage Average effective depth (X) Effective Depth (X) 99.85 99.52 99.88 2.9 3.1 3.0 83.8 34.3 118.1

(B) Mapping summary based on SOAP 1.09 results.


Statistics % Bases mapped % Bases mapped uniquely % With difference % Genome coverage % Gene region coverage Domesticated (meansd) Wild (meansd) 82.124.18 77.592.36 75.722.43 70.803.54 1.880.33 2.510.35 82.023.65 83.034.43 91.833.71 94.163.55

Table S3. SNP summary. (A) SNP numbers in different functional elements.
Total SNPs Synonymous CDS Gene region Intron Stop codon Nonsynonymous Premature stop codon Other Splice Sites Other miRNA ncRNA rRNA tRNA Transposable elements Whole genome 314,639 594 1,658 105,924 1,432 3,084,186 42 76 233 3,801,067 15,986,559 Domesticated group Wild group 263,930 535 1,449 90,856 1,290 2,689,566 37 67 206 3,374,986 276,530 490 1,375 85,397 1,131 2,557,517 34 59 184 3,120,087

14,023,573 13,237,865

(B) Mutation rate () for SNPs (10-2).


Total SNPs Gene region CDS Intron miRNA ncRNA rRNA tRNA Transposable elements Whole genome 0.53 1.20 0.32 1.69 0.45 1.39 1.15 Domesticated group 0.48 1.12 0.31 1.60 0.43 1.32 1.08 Wild group 0.62 1.36 0.36 1.78 0.49 1.55 1.30

Table S4. Indel summary. (A) Indel numbers in different functional elements.
Total indels CDS Gene region frameshift non-frameshift Intron miRNA ncRNA rRNA tRNA Transposable elements Whole genome 1,014 419 65,936 4 1 6 85,259 311,608 Domesticated group Wild group 953 370 59,051 3 1 6 77,871 281,185 872 334 53,112 4 0 5 63,107 251,453

(B) Mutation rate () for indels (10-4).


Total indels Gene region CDS Intron miRNA ncRNA rRNA tRNA Transposable elements Whole genome 0.17 1.72 1.67 0.15 0.35 1.01 1.54 Domesticated group Wild group 0.17 1.65 1.34 0.16 0.37 0.98 1.48 0.19 1.88 2.27 0.00 0.40 1.01 1.68

Table S5. Structural variations (SV) summary.


Total sites Duplication Deletion Insertion Other complex SVs Total 327 34677 80 9 35093 Overlapping with TEs No. % in total sites 28 9 26663 77 21 26 0 0 26712 76

Table S6. Tracy-Widom (TW) statistics and p-values for the six first eigenvalues. The significant p-values are in bold.
Number 1 2 3 4 5 6 Eigenvalues 8.009 3.93 3.885 3.697 3.246 3.168 TW 21.05 1.421 2.819 2.271 -3.375 -3.76 p-values 7.33E-30 0.02627 0.002409 0.006531 0.9661 0.9859

Table S7. Kendalls statistics (p-values) of the correlations between phenotypes and eigenvectors. The significant p-values are in bold.
Eigenvectors 1 2 3 4 Wild vs domesticated -0.640 (1.36E-06) 0.259 (0.051) 0.162 (0.220) 0.347 (0.009) Voltinism 0.446 (0.003) 0.533 (4.45E-04) 0.393 (0.010) 0.546 (3.19E-04) # Molts 0.327 (0.032) 0.051 (0.741) 0.401 (0.009) 0.152 (0.321) Latitudes 0.084 (0.483) -0.259 (0.031) 0.120 (0.316) -0.246 (0.041) Longitudes 0.213 (0.076) 0.094 (0.433) 0.018 (0.880) 0.028 (0.815)

Table S8. The domestication associated (da) SNPs and indels


Whole genome Gene number Gene region da-SNP Total Gene region da-indel CDS region Sub-total TE region Total CDS region Sub-total TE region 14,470 120 410 231 1,347 1 5 12 34 # in GROSS 354 51 198 103 617 1 2 3 12 % in GROSS 2.45% 42.5% 48.3% 44.6% 45.8% 100.0% 40.0% 25.0% 35.3%

Table S9. Genes found in GROSS.


Gene ID BGIBMGA002068 BGIBMGA002041 BGIBMGA002089 BGIBMGA002015 BGIBMGA000616 BGIBMGA000590 BGIBMGA000588 BGIBMGA000582 BGIBMGA000630 BGIBMGA000634 BGIBMGA000644 BGIBMGA000649 BGIBMGA000655 BGIBMGA000678 BGIBMGA000543 BGIBMGA000681 BGIBMGA000688 BGIBMGA000699 BGIBMGA000700 BGIBMGA000521 BGIBMGA013330 BGIBMGA012283 BGIBMGA012277 BGIBMGA012262 BGIBMGA012328 BGIBMGA012253 BGIBMGA012354 BGIBMGA006633 BGIBMGA006714 BGIBMGA006611 BGIBMGA006733 BGIBMGA006609 BGIBMGA006608 BGIBMGA006861 BGIBMGA006862 BGIBMGA006870 BGIBMGA006883 BGIBMGA006917 BGIBMGA006905 BGIBMGA006956 Scaffold nscaf2210 nscaf2210 nscaf2210 nscaf2210 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf1690 nscaf3068 nscaf3040 nscaf3040 nscaf3040 nscaf3040 nscaf3040 nscaf3040 nscaf2855 nscaf2855 nscaf2855 nscaf2855 nscaf2855 nscaf2855 nscaf2859 nscaf2859 nscaf2859 nscaf2859 nscaf2860 nscaf2860 nscaf2860 Start 92099 1264029 1269704 3050656 519948 1025643 1088029 1325859 1451333 1723950 2802592 2947882 3171591 4126607 4284455 4308301 4635598 5115125 5142220 5467593 216549 927204 1297856 1875559 1989681 2339503 3712802 2906740 3287598 4118078 4122115 4129479 4175032 938908 946507 1236319 1644217 582388 1912760 2242429 End 92326 1265086 1270372 3054860 520533 1027490 1091916 1329432 1453695 1729697 2804303 2948292 3176144 4131294 4284811 4309950 4643996 5131329 5143295 5474024 217556 930161 1300111 1878041 1994671 2339977 3716591 2908938 3298729 4118311 4123658 4160859 4176630 944863 946662 1240989 1645271 582579 1918814 2248402 Strand + + + + + + + + + + + + + + + + + + + + + + GROSS ID SWGROSS0002 SWGROSS0005 SWGROSS0005 SWGROSS0008 SWGROSS0025 SWGROSS0030 SWGROSS0035 SWGROSS0040 SWGROSS0044 SWGROSS0046 SWGROSS0056 SWGROSS0061 SWGROSS0064 SWGROSS0067 SWGROSS0075 SWGROSS0076 SWGROSS0078 SWGROSS0080 SWGROSS0081 SWGROSS0086 SWGROSS0099 SWGROSS0101 SWGROSS0107 SWGROSS0110 SWGROSS0111 SWGROSS0113 SWGROSS0121 SWGROSS0130 SWGROSS0132 SWGROSS0135 SWGROSS0135 SWGROSS0135 SWGROSS0136 SWGROSS0142 SWGROSS0142 SWGROSS0143 SWGROSS0147 SWGROSS0152 SWGROSS0156 SWGROSS0160

BGIBMGA002904 BGIBMGA001791 BGIBMGA001792 BGIBMGA001613 BGIBMGA001826 BGIBMGA011963 BGIBMGA011789 BGIBMGA012068 BGIBMGA014358 BGIBMGA010447 BGIBMGA010530 BGIBMGA010531 BGIBMGA010366 BGIBMGA010585 BGIBMGA010659 BGIBMGA010622 BGIBMGA010602 BGIBMGA005897 BGIBMGA005849 BGIBMGA005846 BGIBMGA005845 BGIBMGA005860 BGIBMGA001080 BGIBMGA001078 BGIBMGA001085 BGIBMGA001086 BGIBMGA000972 BGIBMGA001188 BGIBMGA000953 BGIBMGA001213 BGIBMGA001286 BGIBMGA000839 BGIBMGA009486 BGIBMGA009463 BGIBMGA007789 BGIBMGA007741 BGIBMGA007737 BGIBMGA007791 BGIBMGA007792 BGIBMGA007736 BGIBMGA007793

nscaf2575 nscaf2176 nscaf2176 nscaf2176 nscaf2176 nscaf3032 nscaf3031 nscaf3034 scaffold316 nscaf2993 nscaf2993 nscaf2993 nscaf2993 nscaf2993 nscaf2998 nscaf2998 nscaf2998 nscaf2842 nscaf2839 nscaf2839 nscaf2839 nscaf2839 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf1898 nscaf2953 nscaf2953 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888

2096721 2370370 2399280 3893564 3905459 412007 1482091 828516 381740 1921700 3073511 3085807 7885518 7899699 523733 887531 1592883 1372074 94626 120437 123579 138076 98247 134459 181315 210046 7685637 8498417 8655094 9580865 13822734 15736619 512395 1352157 841705 862144 958825 970587 984948 998735 999476

2102318 2372851 2399705 3899643 3906227 413842 1483992 837320 383507 1922150 3079145 3092577 7894362 7911006 525679 890438 1606361 1374771 95364 123051 128616 139012 98654 135294 197148 210724 7687901 8509123 8655273 9583162 13830264 15750274 515181 1354673 848249 862530 959052 982244 994967 999185 1002597

+ + + + + + + + + + + + + + + + + + + + + +

SWGROSS0165 SWGROSS0167 SWGROSS0169 SWGROSS0170 SWGROSS0170 SWGROSS0175 SWGROSS0183 SWGROSS0197 SWGROSS0215 SWGROSS0218 SWGROSS0222 SWGROSS0222 SWGROSS0234 SWGROSS0235 SWGROSS0249 SWGROSS0253 SWGROSS0258 SWGROSS0263 SWGROSS0264 SWGROSS0265 SWGROSS0266 SWGROSS0267 SWGROSS0270 SWGROSS0272 SWGROSS0275 SWGROSS0276 SWGROSS0290 SWGROSS0294 SWGROSS0295 SWGROSS0297 SWGROSS0302 SWGROSS0308 SWGROSS0311 SWGROSS0313 SWGROSS0324 SWGROSS0325 SWGROSS0328 SWGROSS0329 SWGROSS0329 SWGROSS0329 SWGROSS0329

BGIBMGA007849 BGIBMGA007704 BGIBMGA007863 BGIBMGA007699 BGIBMGA007876 BGIBMGA007698 BGIBMGA007877 BGIBMGA007697 BGIBMGA007696 BGIBMGA007882 BGIBMGA007883 BGIBMGA007498 BGIBMGA007497 BGIBMGA007587 BGIBMGA003320 BGIBMGA003306 BGIBMGA002165 BGIBMGA002164 BGIBMGA002163 BGIBMGA002189 BGIBMGA002190 BGIBMGA002191 BGIBMGA002159 BGIBMGA002192 BGIBMGA002158 BGIBMGA002157 BGIBMGA002193 BGIBMGA002195 BGIBMGA002196 BGIBMGA002197 BGIBMGA012984 BGIBMGA012985 BGIBMGA012850 BGIBMGA013039 BGIBMGA012797 BGIBMGA013063 BGIBMGA013150 BGIBMGA013156 BGIBMGA005591 BGIBMGA005662 BGIBMGA005548

nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2888 nscaf2887 nscaf2887 nscaf2887 nscaf2655 nscaf2655 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf2216 nscaf3058 nscaf3058 nscaf3058 nscaf3058 nscaf3058 nscaf3058 nscaf3062 nscaf3063 nscaf2829 nscaf2829 nscaf2829

3516420 3518629 3917256 4326103 4328954 4348687 4350120 4376566 4416895 4525756 4607888 1527342 1536409 1553956 2615446 3224043 1847585 1880672 1892393 1987667 1999801 2005344 2009367 2023151 2093850 2094861 2118245 2172998 2175146 2178396 3531502 3551699 3553591 5904882 7163761 7168258 641988 3472875 792981 2808783 2821640

3517553 3518988 3921794 4326579 4339266 4348911 4353948 4377372 4417656 4527943 4608715 1529161 1551387 1561252 2623537 3224327 1848325 1881072 1895808 1995877 2001268 2005520 2020300 2023396 2094494 2096909 2118403 2174578 2175379 2188564 3534730 3552496 3554842 5906264 7166927 7175427 643304 3475227 795590 2811645 2828107

+ + + + + + + + + + + + + + + + + + + + + -

SWGROSS0341 SWGROSS0341 SWGROSS0342 SWGROSS0346 SWGROSS0346 SWGROSS0346 SWGROSS0346 SWGROSS0348 SWGROSS0349 SWGROSS0351 SWGROSS0354 SWGROSS0362 SWGROSS0362 SWGROSS0362 SWGROSS0369 SWGROSS0371 SWGROSS0372 SWGROSS0373 SWGROSS0374 SWGROSS0375 SWGROSS0375 SWGROSS0375 SWGROSS0375 SWGROSS0375 SWGROSS0377 SWGROSS0377 SWGROSS0378 SWGROSS0380 SWGROSS0380 SWGROSS0380 SWGROSS0386 SWGROSS0386 SWGROSS0386 SWGROSS0389 SWGROSS0391 SWGROSS0391 SWGROSS0394 SWGROSS0398 SWGROSS0402 SWGROSS0412 SWGROSS0414

BGIBMGA005670 BGIBMGA000181 BGIBMGA007030 BGIBMGA007075 BGIBMGA007026 BGIBMGA007025 BGIBMGA007076 BGIBMGA007022 BGIBMGA007021 BGIBMGA003946 BGIBMGA008477 BGIBMGA008478 BGIBMGA008480 BGIBMGA008345 BGIBMGA008481 BGIBMGA008482 BGIBMGA008336 BGIBMGA008299 BGIBMGA008229 BGIBMGA008599 BGIBMGA012578 BGIBMGA001909 BGIBMGA001943 BGIBMGA001877 BGIBMGA001946 BGIBMGA004058 BGIBMGA004072 BGIBMGA004073 BGIBMGA004089 BGIBMGA004095 BGIBMGA004011 BGIBMGA004133 BGIBMGA000192 BGIBMGA000191 BGIBMGA000197 BGIBMGA000190 BGIBMGA004423 BGIBMGA004422 BGIBMGA004437 BGIBMGA004441 BGIBMGA004491

nscaf2829 nscaf125 nscaf2865 nscaf2865 nscaf2865 nscaf2865 nscaf2865 nscaf2865 nscaf2865 nscaf2766 nscaf2902 nscaf2902 nscaf2902 nscaf2902 nscaf2902 nscaf2902 nscaf2902 nscaf2902 nscaf2899 nscaf2903 nscaf3052 nscaf2204 nscaf2204 nscaf2204 nscaf2204 nscaf2767 nscaf2767 nscaf2767 nscaf2767 nscaf2767 nscaf2767 nscaf2767 nscaf1299 nscaf1299 nscaf1299 nscaf1299 nscaf2795 nscaf2795 nscaf2795 nscaf2795 nscaf2795

2954764 283447 1786906 1934487 1947302 1955523 1966386 2193403 2203719 229368 4348713 4361725 4409834 4421504 4426870 4448851 6319391 10742813 320031 494356 1330315 294768 2412039 2416638 2543067 24541 437273 439723 1084199 1703802 2501868 3511558 68753 193256 195757 197533 549940 568650 583598 730395 3101891

2958794 290781 1792175 1942700 1949128 1960187 1976576 2196647 2207035 229922 4349923 4362048 4410312 4421851 4427472 4453538 6319705 10744010 320315 495249 1336544 295203 2415069 2423617 2543568 28208 437623 439992 1086233 1707929 2502656 3513051 72629 194224 196503 199716 561185 574653 586819 732192 3102913

+ + + + + + + + + + + + + + + + + + + + + + +

SWGROSS0415 SWGROSS0421 SWGROSS0429 SWGROSS0435 SWGROSS0435 SWGROSS0436 SWGROSS0436 SWGROSS0438 SWGROSS0438 SWGROSS0442 SWGROSS0449 SWGROSS0450 SWGROSS0452 SWGROSS0452 SWGROSS0452 SWGROSS0452 SWGROSS0457 SWGROSS0463 SWGROSS0467 SWGROSS0470 SWGROSS0475 SWGROSS0481 SWGROSS0487 SWGROSS0488 SWGROSS0491 SWGROSS0494 SWGROSS0497 SWGROSS0497 SWGROSS0500 SWGROSS0502 SWGROSS0506 SWGROSS0507 SWGROSS0509 SWGROSS0514 SWGROSS0514 SWGROSS0514 SWGROSS0522 SWGROSS0523 SWGROSS0524 SWGROSS0525 SWGROSS0527

BGIBMGA004294 BGIBMGA004332 BGIBMGA004281 BGIBMGA014154 BGIBMGA009164 BGIBMGA012390 BGIBMGA012396 BGIBMGA012441 BGIBMGA012442 BGIBMGA012443 BGIBMGA001474 BGIBMGA001424 BGIBMGA001413 BGIBMGA007183 BGIBMGA007147 BGIBMGA007214 BGIBMGA012706 BGIBMGA012707 BGIBMGA012672 BGIBMGA000455 BGIBMGA000458 BGIBMGA000459 BGIBMGA000235 BGIBMGA009976 BGIBMGA009977 BGIBMGA010860 BGIBMGA011520 BGIBMGA011522 BGIBMGA011491 BGIBMGA011575 BGIBMGA011447 BGIBMGA011302 BGIBMGA011263 BGIBMGA011150 BGIBMGA011111 BGIBMGA011108 BGIBMGA011106 BGIBMGA011105 BGIBMGA013304 BGIBMGA000074 BGIBMGA000158

nscaf2789 nscaf2789 nscaf2789 nscaf481 nscaf2937 nscaf3041 nscaf3041 nscaf3044 nscaf3044 nscaf3044 nscaf2136 nscaf2136 nscaf2136 nscaf2868 nscaf2868 nscaf2868 nscaf3055 nscaf3055 nscaf3055 nscaf1681 nscaf1681 nscaf1681 nscaf1681 nscaf2980 nscaf2980 nscaf3005 nscaf3027 nscaf3027 nscaf3027 nscaf3027 nscaf3027 nscaf3026 nscaf3026 nscaf3022 nscaf3022 nscaf3022 nscaf3022 nscaf3022 nscaf3066 nscaf1108 nscaf1108

712177 927496 942561 451441 1396803 938874 1618943 249001 272261 288683 334507 4964679 5615362 62080 1185017 1698453 1319699 1338655 1388503 4443505 4706693 4708125 4724495 108111 117140 1466030 604573 634072 639807 3628029 3629492 267721 2522455 980539 1103245 1153083 1173109 1188435 536514 2370811 2693518

714786 930196 948331 452100 1401488 939662 1626831 255814 272561 291823 335517 4965017 5615526 68901 1185952 1699621 1327487 1341022 1390148 4450565 4706896 4709333 4729130 113882 127672 1466929 605113 637945 641952 3629330 3630736 267891 2529501 983376 1106694 1158676 1182099 1190338 537681 2374956 2696851

+ + + + + + + + + + + + + + + + + + + + + +

SWGROSS0531 SWGROSS0532 SWGROSS0533 SWGROSS0535 SWGROSS0537 SWGROSS0539 SWGROSS0544 SWGROSS0546 SWGROSS0546 SWGROSS0547 SWGROSS0564 SWGROSS0572 SWGROSS0573 SWGROSS0579 SWGROSS0586 SWGROSS0588 SWGROSS0603 SWGROSS0603 SWGROSS0605 SWGROSS0614 SWGROSS0615 SWGROSS0615 SWGROSS0616 SWGROSS0618 SWGROSS0618 SWGROSS0624 SWGROSS0632 SWGROSS0634 SWGROSS0634 SWGROSS0642 SWGROSS0642 SWGROSS0652 SWGROSS0654 SWGROSS0664 SWGROSS0666 SWGROSS0668 SWGROSS0669 SWGROSS0670 SWGROSS0674 SWGROSS0675 SWGROSS0677

BGIBMGA000068 BGIBMGA009573 BGIBMGA009621 BGIBMGA003814 BGIBMGA003813 BGIBMGA012143 BGIBMGA012209 BGIBMGA000803 BGIBMGA000811 BGIBMGA000762 BGIBMGA004930 BGIBMGA005126 BGIBMGA005127 BGIBMGA005073 BGIBMGA005054 BGIBMGA005037 BGIBMGA005036 BGIBMGA005035 BGIBMGA004889 BGIBMGA014016 BGIBMGA013766 BGIBMGA013774 BGIBMGA007420 BGIBMGA007348 BGIBMGA008883 BGIBMGA009099 BGIBMGA009063 BGIBMGA009100 BGIBMGA006126 BGIBMGA006024 BGIBMGA006196 BGIBMGA005947 BGIBMGA003117 BGIBMGA003120 BGIBMGA003017 BGIBMGA003001 BGIBMGA002986 BGIBMGA003182 BGIBMGA003183 BGIBMGA002984 BGIBMGA003184

nscaf1108 nscaf2962 nscaf2962 nscaf2686 nscaf2686 nscaf3035 nscaf3035 nscaf1705 nscaf1705 nscaf1705 nscaf2822 nscaf2823 nscaf2823 nscaf2823 nscaf2823 nscaf2823 nscaf2823 nscaf2823 nscaf2819 nscaf3099 nscaf3097 nscaf3097 nscaf2883 nscaf2883 nscaf2930 nscaf2931 nscaf2931 nscaf2931 nscaf2847 nscaf2847 nscaf2847 nscaf2847 nscaf2589 nscaf2589 nscaf2589 nscaf2589 nscaf2589 nscaf2589 nscaf2589 nscaf2589 nscaf2589

2724200 998093 1000811 727389 745692 1333943 1620720 723956 929222 931265 1120808 1711000 1716842 1719576 2614741 3014996 3032818 3039980 70070 4534081 122693 143615 1643269 1650835 3679024 1109065 1181113 1207086 2235509 2249481 7317610 7324067 2788848 2982040 3970067 4430467 5011367 5020574 5035050 5038429 5045637

2728350 999370 1002755 735981 748956 1337447 1621383 726243 929401 942452 1125596 1715397 1719277 1735077 2614938 3026850 3038922 3042193 74975 4535149 125957 152483 1647812 1652450 3682236 1114933 1182838 1208209 2236204 2250042 7318551 7326058 2792405 2984233 3974199 4434674 5014764 5025999 5035904 5043860 5048623

+ + + + + + + + + + + + + + + + + +

SWGROSS0678 SWGROSS0686 SWGROSS0686 SWGROSS0688 SWGROSS0688 SWGROSS0690 SWGROSS0691 SWGROSS0692 SWGROSS0693 SWGROSS0693 SWGROSS0698 SWGROSS0708 SWGROSS0708 SWGROSS0708 SWGROSS0712 SWGROSS0713 SWGROSS0714 SWGROSS0714 SWGROSS0717 SWGROSS0731 SWGROSS0732 SWGROSS0733 SWGROSS0742 SWGROSS0742 SWGROSS0750 SWGROSS0762 SWGROSS0763 SWGROSS0763 SWGROSS0770 SWGROSS0770 SWGROSS0774 SWGROSS0774 SWGROSS0784 SWGROSS0787 SWGROSS0790 SWGROSS0792 SWGROSS0793 SWGROSS0793 SWGROSS0794 SWGROSS0794 SWGROSS0794

BGIBMGA003210 BGIBMGA003775 BGIBMGA003774 BGIBMGA003796 BGIBMGA013534 BGIBMGA013484 BGIBMGA013479 BGIBMGA002650 BGIBMGA002659 BGIBMGA002638 BGIBMGA002637 BGIBMGA002665 BGIBMGA003662 BGIBMGA003527 BGIBMGA003522 BGIBMGA003520 BGIBMGA003692 BGIBMGA003722 BGIBMGA003745 BGIBMGA003751 BGIBMGA013449 BGIBMGA013438 BGIBMGA006506 BGIBMGA006279 BGIBMGA002750 BGIBMGA010243 BGIBMGA005258 BGIBMGA005257 BGIBMGA005389 BGIBMGA005388 BGIBMGA005387 BGIBMGA005361 BGIBMGA005291 BGIBMGA005290 BGIBMGA009855 BGIBMGA008026 BGIBMGA008065 BGIBMGA008005 BGIBMGA008077 BGIBMGA008088 BGIBMGA012547

nscaf2589 nscaf2681 nscaf2681 nscaf2681 nscaf3075 nscaf3075 nscaf3075 nscaf2529 nscaf2529 nscaf2529 nscaf2529 nscaf2529 nscaf2674 nscaf2674 nscaf2674 nscaf2674 nscaf2674 nscaf2674 nscaf2674 nscaf2674 nscaf3074 nscaf3074 nscaf2853 nscaf2852 nscaf2556 nscaf2986 nscaf2827 nscaf2827 nscaf2828 nscaf2828 nscaf2828 nscaf2828 nscaf2828 nscaf2828 nscaf2970 nscaf2889 nscaf2889 nscaf2889 nscaf2889 nscaf2890 nscaf3048

6340171 23335 40830 1375483 927568 947924 1074746 20030 394849 401169 410415 1368318 3201995 3336473 3651420 3700656 4914435 6172780 7348048 8028229 177476 536518 4782060 1273614 706207 3210189 703485 743864 532676 547118 612009 2200028 5832182 5839332 937175 475035 1123114 1770015 1947243 1358624 195141

6342603 26088 43373 1380402 931683 949209 1083885 23473 397565 409347 411703 1369074 3205732 3338854 3652209 3701981 4915076 6173906 7348551 8031659 178865 540692 4790575 1282516 711285 3211716 704826 745688 541292 547979 614540 2200666 5832448 5842013 938230 479870 1125760 1770661 1948562 1359112 202355

+ + + + + + + + + + + + + + + + -

SWGROSS0803 SWGROSS0804 SWGROSS0804 SWGROSS0808 SWGROSS0812 SWGROSS0813 SWGROSS0815 SWGROSS0817 SWGROSS0818 SWGROSS0818 SWGROSS0818 SWGROSS0820 SWGROSS0838 SWGROSS0839 SWGROSS0841 SWGROSS0843 SWGROSS0845 SWGROSS0850 SWGROSS0855 SWGROSS0856 SWGROSS0859 SWGROSS0863 SWGROSS0880 SWGROSS0887 SWGROSS0895 SWGROSS0901 SWGROSS0916 SWGROSS0917 SWGROSS0920 SWGROSS0920 SWGROSS0923 SWGROSS0926 SWGROSS0934 SWGROSS0935 SWGROSS0937 SWGROSS0948 SWGROSS0956 SWGROSS0958 SWGROSS0960 SWGROSS0961 SWGROSS0962

BGIBMGA012546 BGIBMGA012536 BGIBMGA012568 BGIBMGA002437 BGIBMGA002436 BGIBMGA002473 BGIBMGA002503 BGIBMGA007227 BGIBMGA007316 BGIBMGA009198 BGIBMGA009199 BGIBMGA009200 BGIBMGA009195 BGIBMGA012222 BGIBMGA012642 BGIBMGA014460 BGIBMGA014530 BK006600 BMOBMSQD2 DQ443151 NM_001043430 NM_001043469 NM_001043506 NM_001043536 NM_001043640 NM_001043670 NM_001043790 NM_001043818 NM_001043858 NM_001043864 NM_001043925 NM_001043949 NM_001044079 NM_001044193 NM_001046707 NM_001046773 NM_001046846 NM_001046888 NM_001046906 NM_001046908 NM_001046914

nscaf3048 nscaf3048 nscaf3048 nscaf2511 nscaf2511 nscaf2511 nscaf2511 nscaf2874 nscaf2879 nscaf2940 nscaf2940 nscaf2940 nscaf2940 nscaf3038 nscaf3053 scaffold697 scaffold773 nscaf2983 nscaf3074 nscaf2855 nscaf2819 nscaf3031 nscaf2993 nscaf1681 nscaf2986 nscaf2795 nscaf3026 nscaf2998 nscaf3074 nscaf2823 nscaf3058 nscaf3027 nscaf2589 nscaf3058 nscaf2674 nscaf3055 nscaf3055 nscaf2855 nscaf2888 nscaf2398 nscaf3055

207623 934812 942176 1155942 1160826 1419729 2703098 277365 27970 21340 35696 59303 144091 207968 88478 46872 15075 2855561 225809 4124966 77516 4519785 7930909 710443 3400971 3112010 6536157 524831 225809 112743 3535759 614772 4436428 3544760 3651631 1358891 1379740 4124416 4533439 18613 1334191

210390 939078 945749 1158533 1165937 1425032 2705442 279865 29131 21832 38513 62354 146890 208852 90119 47867 20974 2858301 230471 4125468 87377 4521943 7934254 712032 3408606 3114954 6541507 525679 228153 113792 3539189 616427 4440264 3551068 3652224 1361101 1382298 4125468 4534773 21555 1337290

+ + + + + + + + + + + + + + + + + + -

SWGROSS0962 SWGROSS0967 SWGROSS0968 SWGROSS0977 SWGROSS0978 SWGROSS0981 SWGROSS0986 SWGROSS0997 SWGROSS1003 SWGROSS1005 SWGROSS1006 SWGROSS1007 SWGROSS1009 SWGROSS1016 SWGROSS1018 SWGROSS1036 SWGROSS1039 SWGROSS0898 SWGROSS0861 SWGROSS0135 SWGROSS0718 SWGROSS0193 SWGROSS0236 SWGROSS0608 SWGROSS0902 SWGROSS0527 SWGROSS0661 SWGROSS0249 SWGROSS0861 SWGROSS0702 SWGROSS0386 SWGROSS0633 SWGROSS0792 SWGROSS0386 SWGROSS0841 SWGROSS0604 SWGROSS0605 SWGROSS0135 SWGROSS0352 SWGROSS0554 SWGROSS0603

NM_001046956 NM_001047050 NM_001047081 NM_001048240 NM_001098281 NM_001098283 NM_001098292 NM_001098355 NM_001098362 NM_001099614 NM_001099617 NM_001099621 NM_001099812 NM_001099812 NM_001102461 NM_001105232 NM_001109933 NM_001110008 NM_001113276 NM_001114935 NM_001123339 NM_001130876 NM_001130897 NM_001130902 NM_001134916 NM_001142487 S74376

nscaf3074 nscaf2589 nscaf2556 nscaf3048 nscaf1898 nscaf2589 nscaf1898 nscaf2970 nscaf3055 nscaf2795 nscaf2828 nscaf2993 nscaf3097 nscaf2983 nscaf2674 nscaf2828 nscaf2993 nscaf2681 nscaf3026 nscaf2912 nscaf2993 nscaf2902 nscaf2962 nscaf1690 nscaf2829 nscaf2888 nscaf2852

181646 5028991 707639 204154 8495777 5090413 15756226 937175 1328064 737598 5846111 7962631 865247 1786366 3705843 595874 7891332 27230 2396448 405528 8045520 4364773 336752 386396 796907 4426682 1255833

188437 5030021 711751 205695 8497304 5090727 15759415 938024 1331664 739792 5846618 7964876 866299 1787409 3715524 600459 7894362 31947 2401216 407195 8054363 4366233 339454 400958 801316 4428780 1256608

+ + + + + + + + + + + -

SWGROSS0859 SWGROSS0793 SWGROSS0895 SWGROSS0962 SWGROSS0294 SWGROSS0795 SWGROSS0308 SWGROSS0937 SWGROSS0603 SWGROSS0525 SWGROSS0935 SWGROSS0238 SWGROSS0735 SWGROSS0896 SWGROSS0843 SWGROSS0922 SWGROSS0234 SWGROSS0804 SWGROSS0653 SWGROSS0908 SWGROSS0244 SWGROSS0450 SWGROSS0684 SWGROSS0020 SWGROSS0402 SWGROSS0350 SWGROSS0886

Table S10. Number of genes per GROSS.


Gene # per GROSS 1 2 3 4 5 GROSS # 212 42 9 4 3

Supporting References and Notes S1. http://www.illumina.com/. S2. J. Wang et al., Nature 456, 60 (2008). S3. J. Wang et al., Nucleic Acids Res. 33, D399 (2005). S4. Silkworm Genome Database (http://silkworm.swu.edu.cn/silkdb/ or http://silkworm.genomics.org.cn/). S5. The International Silkworm Genome Consortium, Insect Biochem. Mol. Biol. 38, 1036 (2008). S6. BmMDB (http://silkworm.swu.edu.cn/microarray/). S7. R. Li, Y. Li, K. Kristiansen, J. Wang, Bioinformatics 24, 713 (2008). S8. R. Li et al., Genome Res. 19, 1124 (2009). S9. http://www.sequenom.com/. S10. J. C. Barrett, B. Fry, J. Maller, M. J. Daly, Bioinformatics 21, 263 (2005). S11. J. Felsenstein, (2005). S12. N. Patterson, A. L. Price, D. Reich, PLoS Genet. 2, e190 (2006). S13. M. Kendall, Biometrika 30, 81-89 (1938). S14. S. Purcell et al., Am J. Hum. Genet. 81, 559 (2007). S15. http://pngu.mgh.harvard.edu/ purcell/plink/. S16. H. Tang, J. Peng, P. Wang, N. J. Risch, Genet. Epidemiol. 28, 289 (2005). S17. http://med.stanford.edu/tanglab/software/frappe.html. S18. R. R. Hudson, Bioinformatics 18, 337 (2002). S19. F. Tajima, Genetics 105, 437 (1983). S20. F. Tajima, Genetics 123, 585 (1989). S21. M. Nei, Molecular evolutionary genetics. (Columbia University Press, New York, 1987). S22. http://rana.stanford.edu/software/. S23. G. A. Watterson, Theor. Popul. Biol. 7, 256 (1975). S24. G. K. Wong et al., Nature 432, 717 (2004). S25. D. J. Begun et al., PLoS Biol. 5, e310 (2007). S26. W. G. Hill, A. Robertson, Theor. Appl. Genet. 31, 881 (1968). S27. S. J. Macdonald, T. Pastinen, A. D. Long, Genetics 171, 1741 (2005). S28. K. Yamamoto et al., Genome Biol. 9, R21 (2008). S29. M. Beye et al., Genome Res. 16, 1339 (2006). S30. P. Gepts, R. Papa Evolution during Domestication. In: ENCYCLOPEDIA OF LIFE SCIENCES. John Wiley & Sons Ltd, Chichester (2002). S31. Q. Zhu, Mol. Biol. Evol. 24, 875 (2007). S32. A. Haudry, Mol. Biol. Evol. 24, 1506 (2007). S33. M. I. McCarthy et al., Nat. Rev. Genet. 9, 356 (2008). S34. S. K. Wright, R. E. Viola, J. Biol. Chem. 276, 31151 (2001). S35. G. K. Balendiran et al., J. Biol. Chem. 275, 27045 (2000). S36. T. Ito, Nutrition and artificial diets of the silkworm, Bombyx mori. (Nihon-Sanshi-Shinbun Press, Tokyo, 1983). S37. G. Liti et al., Nature 458, 337 (2009). S38. K. E. Holt et al., Nat. Genet. 40, 987 (2008). S39. J. Mu et al., Nat. Genet. 39, 126 (2007). S40. A. C. Palmenberg et al., Science 324, 55 (2009). S41. K. A. Frazer et al., Nature 449, 851 (2007). S42. K. A. Frazer et al., Nature 448, 1050 (2007). S43. K. Lindblad-Toh et al., Nature 438, 803 (2005). S44. J. W. Kijas et al., PLoS ONE 4, e4668 (2009). S45. R. A. Gibbs et al., Science 324, 528 (2009). S46. M. R. Goldsmith, T. Shimada, H. Abe, Annu. Rev. Entomol. 50, 71 (2005).

S47. K. P. Arunkumar, M. Metta, J. Nagaraju, Mol. Phylogenet. Evol. 40, 419 (2006). S48. C. A. Driscoll, D. W. Macdonalda, S. J. OBrien, Proc. Natl. Acad. Sci. USA. 106, 9971 (2009).

Potrebbero piacerti anche