Sei sulla pagina 1di 2

ne w s and v i e w s

Genome sequencing on nanoballs


Gregory J Porreca
Advances in technology deliver cheaper human genome sequencing.
The rapid pace of innovation in the field of genome sequencing continues with a recent publication in Science by Drmanac et al.1. The authors resequenced three full human genomes using a next-generation technology that combines highly efficient imaging on ordered arrays with an inexpensive ligation-based chemistry. These technological improvements further reduce the cost of human genome sequencing. Next-generation sequencing technologies generate up to billions of short reads in a run. All of these approaches use either polymerase or ligase to identify each base with a fluorescent signal that is read by a microscope and a digital camera. Development of these systems has focused on manipulating and arraying DNA such that it can be seen by the camera and sequenced, and on devising a sequencing chemistry with sufficient accuracy and read-length. For effective human genome sequencing, the individual DNA spots should be small (1 m) and present at high density (approaching 1 million spots per mm2). Furthermore, the reads must be long enough (>30 bp) to allow unambiguous alignment to the reference sequence, which is the first step in identifying variants. Drmanac et al.1 have met these goals with a platform that integrates several technologies: (i) a library-generation protocol that transforms fragments of genomic DNA into highly engineered molecules; (ii) a method for generating spots, called DNA nanoballs, and arraying them in highly dense grids for efficient imaging; and (iii) a nonprogressive chemistry (that is, errors do not accumulate because each base is read from a fresh sequencing primer) that uses ligation with partially degenerate sequencing primers13 to yield accurate ~70-bp reads split across eight priming sites (Fig. 1). What is most intriguing is how the platform approaches several technical optima that in concert drive down cost. First, the amount of reagent used is dictated by the area and height of the instruments flow-cell chamber.Drmanac et al.1 recognized that the chambers height does not affect sequencing
Gregory J. Porreca is at Good Start Genetics, Boston, MA. e-mail: gporreca@gsgenetics.com

performance because the submicron-sized DNA features are attached to its surface. So they devised a process to manufacture thin chambers and perfected a way to flow liquid through them, potentially enabling significant cost savings over current systems with thicker flow-cells. Second, throughput is driven both by the speed of the camera and by how many spots can be packed into a single image. The authors used the electron-multiplied charge-coupled device (CCD) present in several other sequencing systems2 (http://www.polonator.org/), which is faster and more sensitive than what is found in the most popular next-generation platforms on the market. They combined this camera with one of their key innovations, a patterned array of DNA nanoballs. These compact chains of amplified DNA assemble into a densely packed grid of spots on the flowcell surface, maximizing the yield of useful sequenced bases from camera pixels. Nanoballs offer a higher array packing density than bridge amplification4, because they physically exclude other DNA molecules from their spot on the grid, and a much easier and cheaper workflow than emulsion PCR5, because they are prepared in a simple reaction that does not waste most of the amplification reagents on empty emulsion bubbles. The result is an instrument capable of maximal throughput, given todays camera technology, and therefore minimal capital cost per base pair. It is difficult to directly compare sequencing cost between different platforms. This is because, for genomic resequencing, cost is driven by the coverage required to achieve the desired accuracy. Different platforms may require different levels of coverage to achieve the same accuracy, so comparisons have to be made by fixing either coverage, to measure differences in cost and accuracy, or accuracy, to measure differences in coverage and cost6. There are other considerations as well. Different mutations (e.g., homozygous versus heterozygous substitutions, insertions or deletions) are generally sequenced with different accuracy. Moreover, the reference standard used to verify mutations must be more accurate than the sequence in question, and this is difficult to achieve with the genomewide single-nucleotide polymorphism chips that are often used. All of these factors

combine to confound a simple bases-perdollar comparison. So what can be said about the relative cost of this approach? Drmanac et al.1 sequenced three genomes at a coverage of 4587 and at an average reagent cost of $4,400 per genome. By their estimates, one false-positive sequencing error occurred every 100,000 basesan accuracy on par with, or better than, that of other popular sequencing platforms. Complete Genomics, the company associated with the study by Drmanac et al.1, has positioned itself as a service provider of full human genome sequences rather than as a vendor of sequencing instruments and reagents. Of course, the cost of reagents does not include equipment, labor and data-handling, and is not the same as the price charged to customers for a genome sequence. One appropriate comparison is therefore with Illuminas Personal Genome Sequencing Service, which delivers a full human genome sequence (on an iMac computer) for $48,000. Complete Genomics currently charges $20,000 for a sequence of similar accuracy1,4,7. With time, it is certain that prices across all vendors will fall. Complete Genomics has a target price of $5,000 per genome for bulk orders7, a substantial drop that is certainly possible in the near term. Reducing prices significantly beyond that will likely require further innovation. For instance, substantial increases in instrument throughput could be achieved by switching from CCD cameras to the much faster and cheaper complementary metal oxide semiconductor (CMOS) technology. But CMOS is less sensitive, so the DNA nanoballs would have to be made brighter, which may require considerable research and development. As Complete Genomics makes progress in process automation and robustness, they may be able to address applications beyond human genome sequencing, including gene expression analysis, chromatin immunoprecipitation and metagenomics. For these, part of the difficulty will be in the process scaling and multiplexing required to accommodate the ultra-high throughput of their machines. In addition, for quantitative applications, it will be important to ensure they can calibrate for biases introduced by the library- and nanoball-generation protocols.

2010 Nature America, Inc. All rights reserved.

nature biotechnology volume 28 number 1 january 2010

43

ne w s and v i e w s
Drmanac et al.1 Previous approach2
Figure 1 Comparison of the sequencing process in Drmanac et al.1 (left) and in a previous sequencing-by-ligation method2 (right). (a) Genomic DNA is converted into library molecules. Each molecule contains four segments of genomic DNA (iiv), flanked by priming sites (shown in purple). Each library molecule is converted into a linear concatemer of itself to become a DNA nanoball. (b) Billions of DNA nanoballs are added to a silicon slide that contains a grid-like pattern of binding sites, which causes the nanoballs to self-assemble into a dense grid of spots for sequencing, maximizing the number of useful sequenced bases in each image (see d). (c) Ligation-based sequencing chemistry is used to interrogate bases of genomic DNA in the library molecules. Each cycle of sequencing tags the DNA nanoballs with a fluorophore whose color identifies the base (A, C, T or G) present at a specific position. The chemistry allows 510 contiguous bases to be read from each of the eight priming sites in the library molecule. (d) Digital images of the patterned arrays are taken after each sequencing reaction. The images are computationally analyzed to generate billions of raw sequence reads. These reads are then processed with assembly and analysis software to accurately identify mutations.

Library generation Library molecule


i iv ii iii

Rolling circle amplification DNA nanoballs

Emulsion PCR DNA captured with beads

Arraying

2010 Nature America, Inc. All rights reserved.

Patterned array in low-volume flowcell

Random array in larger flow cell

Ligation-based sequencing Probes


N N N C N N NN N N N N A N N NN N N N N T N N NN N N N N GN N NN N

Probes
N N N C N N NN N N N N A N N NN N N N N T N N NN N N N N GN N NN N

Ligase Anchor
N N NN N N N GN N NN N G A T C A T T C C G G A A

Ligase Anchor
N N N GN N NN N G A T C A T T C C G G A A

Genomic DNA Degenerate anchors allow 6270 bases per spot

Genomic DNA Standard anchors allow 26 bases per spot

Imaging

In the span of a few short years, the mature technology of capillary sequencing has been supplanted by new sequencing approaches that offer tremendous increases in how much we can afford to sequence and how quickly we can do it. As the technology advances, focus will shift from the initial feats of sequencing single genomes to the ongoing challenge of producing lots of sequence accurately and efficiently. In this endeavour, the platform of Drmanac et al.1 is sure to remain in the mix.
COMPETING INTERESTS STATEMENT The author declares competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/ naturebiotechnology/.
1. Drmanac, R. et al. Science, published online doi:10.0026/science.1181498 (5 November 2009). 2009 Nov 5 [Epub ahead of print] 2. Shendure, J.A. et al. Science 309, 17281732 (2005) 3. Church, G.M. et al. US appl. no. 2007/0207482 (2007). 4. Bentley, D.R. et al. Nature 456, 5359 (2008). 5. Mckernan, K.J. et al. Genome Res. 19, 15271541 (2009). 6. Fuller, C.W. et al. Nat. Biotechnol. 27, 10131023 (2009). 7. Karow, J. Complete genomics details low-cost sequencing tech in paper; collaborators encouraged by results. InSequence <http://www.genomeweb.com/sequencing/complete-genomics-details-low-cost-sequencingtech-paper-collaborators-encourage> (10 November 2009).

More spots per image

Fewer spots per image

As costs continue to drop, the use of sequencing in diagnostics is expected to increase dramatically. This large market imposes significant requirements on any technology it adopts. Costs must be very low to displace existing technologies, and accuracy must be extremely high. False-positive mutation calls drive up assay cost by requiring expensive and time-intensive verification,

and false-negative calls (which generally cannot be verified) are a source of diagnostic error. Thus, accuracy must be high, quantifiable and thoroughly measured in advance of releasing an assay into the clinic. Whats more, continual monitoring of assay performance and compliance with clinical laboratory bestpractices are imperative if a sequence is to be considered actionable medical advice.

44

volume 28 number 1 january 2010 nature biotechnology

Kim Caesar

Potrebbero piacerti anche