Sei sulla pagina 1di 5

[3B2-9] mmi2012010066.

3d 18/1/012 16:14 Page 66

Prolegomena
Department Editors: Kevin Rudd and Kevin Skadron
..................................................................................................................................................................................................................

(Re)Designing Data-Centric
Data Centers

PARTHASARATHY RANGANATHAN
JICHUAN CHANG
HP Labs

....... We are entering an exciting era recommendations, and so on illustrate improvements in system architecture
for systems designone driven by data- this potential. But on the other hand, typically only track base computing per-
centric computing. A recent report from this data is also creating a host of new formance (for example, historical flops
the University of San Diego estimated problems. In particular, the growth in at http://top500.org). However, as future
that, conservatively, enterprise server data produced is outpacing the improve- systems are increasingly used to cap-
systems have processed and delivered ments in the cost and density of storage ture, classify, analyze, manage, and ar-
more than 9 zettabytes of information technologies. Also, perhaps more impor- chive large volumes of data, we will
in 2008 (where 1 zettabyte 1021 tantly, our ability to process the data to need a corresponding rethinking of sys-
bytes);1 this number is projected to dou- extract meaningful, actionable insights tem architecture focused on data stor-
ble every two years. Walmart servers, is significantly lagging our ability to col- age and management.
for example, handle more than 1 million lect and store data. Figure 1a presents an overview of the
customer transactions every hour, feed- Given these challenges and opportu- continuum of different architectural
ing databases estimated in several nities, it is important to rethink how we organizations to address data manage-
petabytes. High-performance computing design future data-centric systems. At ment. At the left is a traditional system
systems working with the Large Hadron the same time, technology inflections design using mechanical disks as the
Collider filter through roughly one peta- such as the increased adoption of non- persistent data store and DRAM mem-
byte of data per second and still produce volatile memories, optical communica- ory as a caching layer. Further in the con-
15 petabytes a year after multiple levels tions, multicores, and heterogeneous tinuum, several products (such as EMC,
of data selection. Each day, Facebook computing all provide a unique opportunity Fusion-IO, HP, Oracle, Seagate, and
operates on nearly 100 terabytes of user for an end-to-end redesign of data-centric Texas Memory Systems) expose Flash-
log data and several hundred terabytes solutions across both hardware and soft- based nonvolatile memories as block de-
of user pictures; similarly, 48 hours of ware. Here, we discuss recent computer vices either through Serial Attached
video content is uploaded every minute architecture and systems research SCSI (SAS) and Serial Advanced Technol-
on YouTube (a sixfold increase from four matched with such redesigns, culling ogy Attachment (SATA) or PCI Express
years ago).2 out cross-cutting directions across these interfaces; the flash memories are used
This vast and growing amount of in- projects that suggest research opportu- as disk replacements or disk caches,
formation represents both an opportunity nities for the broader community. with appropriate software support (such
and a challenge. On one hand, the ability as Fusion-IO drivers, Oracle ASM, and
to collect and process large volumes of Rethinking data-centric Facebook Flashcache). Flash can also
new data can drive scientific break- system architectures be combined with disaggregated mem-
throughs, new business process optimi- Historically, system architecture ory to provide large memory space at
zations, and day-to-day improvements in designs have been driven by advances low cost.3 Further out, several research
our personal lives. Recent data-centric in processor designs, with the perfor- studies have also discussed using non-
applications for personalized genome mance of the I/O subsystem usually volatile memory such as phase-change
sequencing, real-time trends from busi- only a secondary design consideration. memory (PCRAM) or memristor as
ness analytics, social-network-based Indeed, most popular analyses of byte-addressable memory devices off
..............................................................

66 Published by the IEEE Computer Society 


0272-1732/12/$31.00 c 2012 IEEE
[3B2-9] mmi2012010066.3d 18/1/012 16:14 Page 67

Storage class memories:


DRAM/NVM
NVM PCI hybrid memory, NVM
NVM PCI Express memory Flash with for checkpointing,
Express storage: interface: wimpy byte-addressable
NVM NVM cache:
direct-attached, disaggregated cores: Gordon persistent NVM,
block Active Flashcache
Disks storage array, etc. NVM FAWN, blades nanostores, etc.
storage storage Hybrid SSDs

RAM
RAM

RAM
RAM
RAM
RAM

CPU CPU CPU CPU CPU CPU

RAM
CPU
$ $ $ $ $ $ $ NVM
CPU NVM

NVM
RAM
RAM CPU
CPU

NVM
CPU

NVM
Net IOH IOH IOH IOH $
Net Net Net IOH Net IOH Net I/O/flash CPU $
$$
SATA Net Net
SATA SATA SATA SATA PCIe Ctrl $ Net
Net
Controller PCIe NVM Net IOH
HDD SSD HDD HDD HDD NV M SATA Net IOH Net IOH
or SSD HDD SATA RAM
NVM CPU
NV M
(a) $
DRAM/NVM hybrid memory,
NVM for checkpointing,
Memory bus/on-chip integrated byte-addressable persistent NVM Nanostores
Gordon, FAWN
Flash controller Flashcache, etc. blades, etc.
Interface

PCI Express, memory Disaggregated NVM

PCI Express, block Direct-attached and storage arrays with NVM PCIe Active
storage
SAS/SATA Disk, hybrid SSD, NVM block storage
(b) Far Proximity of compute to persistent data store Near

Figure 1. A visual taxonomy of recent system designs for data-centric applications. Overview of system architectures of different
designs for data management (a). Classification of different system designs to illustrate trends (b). (PCIe: PCI Express.)

the memory bus or 3D-stacked on the designs will have computing and per- system balance to improve energy effi-
chip.4-6 Some proposals have used sistent data store in a single die. How- ciency for cloud workloads. More gener-
the nonvolatile memory as additional ever, there are several open research ally, rethinking the storage and memory
levels of memory caches,7,8 and others questions. What are the implications hierarchy will create new system bottle-
have used the nonvolatile memory as of different nonvolatile memory tech- necks, altering the traditional balance
a replacement for the persistent nologies on traditional memory and between the storage, compute, and
data store,9 including with collapsed storage hierarchies? What are the communication performance. For exam-
hierarchies in distributed systems.6 tradeoffs between the different mem- ple, how do we design new computing
Figure 1b presents an alternate view ory organizations in Figure 1? Are structures to better exploit the huge
of these designs, classified along two organizations possible that collapse bandwidth enabled by through-silicon-
dimensionsthe type of interface to the memory hierarchy and reduce en- vias in 3D-stacked persistent data
the nonvolatile memory that is exposed ergy overheads of data movement? stores? What are the appropriate applica-
(y-axis), and the proximity of the com- Can we design heterogeneous or tions for different kinds of compute
puting to the persistent data store morphable organizations that match coreswimpy and brawny? Do we
(x-axis); the arrow highlights the direc- specific application characteristics to need to redesign communication provi-
tion of recent shifts. Classifying current specific memory technology features sioning, particularly for large-scale distrib-
approaches in such a view illustrates (both strengths and weaknesses)? uted data centers? How should we use
several interesting trends. How do we address new resiliency energy-efficient high-radix optical com-
challenges introduced by endurance munication in future designs?
Rethinking memory and storage hierarchy limits in nonvolatile memory?
in future system architectures Moving compute to the action
A key trend, evident from Figure 1, is Rethinking the balance between the data Another important trend is moving
that persistent data storage is steadily store, compute, and communication the compute closer to the data. Recent
migrating from slow (disk-like) interfaces Several recent proposals (such as distributed data management frame-
to faster (memory-like) interfaces with mblades,10 FAWN,11 and Gordon12) com- works such as MapReduce/Hadoop
increasing flexibility and performance. bine Flash-based storage with wimpy already operate at large scale by parti-
Extrapolating these trends, future lower-power processors for better tioning the data set across individual
....................................................................

JANUARY/FEBRUARY 2012 67
[3B2-9] mmi2012010066.3d 18/1/012 16:14 Page 68

..........................................................................................................................................................................................................................
PROLEGOMENA

nodes and scheduling tasks matched to for such heterogeneous architectures is systems,29,30,9 key-value stores,31 and
the data they operate on. With increas- also an open question. databases.32 These examples have dem-
ing energy costs from excessive and in- onstrated that with careful consideration
efficient data movement, colocating Rethinking software interfaces of the tradeoffs, nonvolatile memory can
compute closer to data within the mem- and algorithms provide significant performance advan-
ory hierarchy might have significant ben- Rethinking system architecture will tages without compromising persistence
efits. For example, Micron recently require rethinking systems software as guarantees. But more research opportu-
announced its hybrid memory cube tech- well. Specifically, with improvements in nities remain. For example, can we de-
nology, which couples a logic layer with hardware performance and balance, sign new database-join algorithms to
3D-stacked DRAM on the same chip,13 software efficiency will become the better leverage high-radix optical connec-
and the nanostore proposal6 seeks to next bottleneck. For example, research- tions? How can we codesign across
colocate compute with the persistent ers have already identified traditional hardware and software for large in-
data stores. Such ideas are thematically communication stacks software over- memory data stores? How can we design
similar to previous ideas such as Active heads as key bottlenecks in optimized future file systems to avoid copying and
Storage (more capable disk controllers distributed systems, such as in the Stan- to leverage persistent data stores? How
for offloading and streaming),14 Intelli- ford RAMCloud project22 or in Google can such software approaches further
gent RAM (co-located vector process- distributed clusters.23 The byte-address- take advantage of optimizations such as
ors with DRAM),15 or Processor-in- ability of emerging nonvolatile memories compute hierarchies or accelerators?
Memory,16 but with different instantia- also opens up possibilities for new per- Similarly, traditional operating systems
tions in the context of emerging tech- sistent data stores with random access architecture and abstractions were devel-
nologies and future distributed semantics, massive aggregated through- oped in the era of slow disks and limited
architectures. However, several open put, and energy-efficient access. New memories. Improvements to the data
questions remain. What is the appropri- optimizations are possible at various lev- path such as with nonvolatile memory
ate system organization? Should we be els of the software stack: interfaces, could require corresponding systems
considering a hierarchy of computing low-level device drivers, data storage software redesign including possibly
elements surrounding the data store, systems, and higher-level algorithms. greater embedded management in the
inverting the traditional model of data hardware to avoid kernel overheads.
hierarchies surrounding computation? New interfaces
How do we develop appropriate soft-
ware models to offload and coordinate
computation across the various compu-
With nonvolatile memories, system
architects can design systems where
memory writes are instantly durable,
I nformation will be the most valuable
resource in the 21st century. Operat-
ing on large volumes of diverse data
tational units? but at the same time, this removes a de- sources to get the right actionable
gree of isolation and security provided by insights at the right time presents new
Matching compute to the action indirection. Two recent proposals, Mne- challenges and opportunities for system
Recent studies have argued that lim- mosyne24 and NV-heaps,25 have exam- design. Addressing these opportunities
ited power budgets in future processors ined user-level interfaces for safely and requires a rethinking of future server
could lead to dark silicon17,18 efficiently using nonvolatile memory, via and data center designwith a data-
designs where only some parts of a durable memory transactions. Additional centric focus across both hardware and
chip are used at any given point in new interfaces could be beneficial, for software. Here, weve presented a brief
timepotentially leading to more spe- example, to explicitly reason about vola- introduction to some recent research
cialization in future processors (for exam- tility of data, for abstractions to distin- activities in this exciting emerging area,
ple, 101019-21). Such specialization can guish persistent data such as files from with a specific focus on system archi-
provide significant energy efficiency volatile data such as virtual memory, or tecture and systems software.
advantages. Prior work has examined user-selectable consistency and resilience There are also other important re-
special-purpose architectures optimized semantics. Similarly, new software search challenges that we didnt discuss.
for specific workloads, including use of hardware interfaces can better support Notably, more work is needed in new
GPUs, field-programmable gate arrays other architectural trends such as multi- benchmarks and modeling methodolo-
(FPGAs), and even application-specific cores26 or GPUs.27 gies for future data-centric data centers.
integrated circuits (ASICs). More work Similarly, significant opportunities exist
is needed, however, to understand how New data stores and systems software for applications enabled by new data-
these designs apply to broader data- Several research studies have exam- centric data center designs: for example,
centric workloads. The appropriate sys- ined the redesign of data stores and sophisticated, yet cost-effective, insight
tem architecture and software model data structures, such as B-trees,28 file generation from huge existing volumes
....................................................................

68 IEEE MICRO
[3B2-9] mmi2012010066.3d 18/1/012 16:14 Page 69

of archival data (data-at-rest), or non- 3. K.T. Lim et al., Disaggregated Mem- Languages and Operating Systems,
traditional brain-inspired systems that ory for Expansion and Sharing in ACM Press, 2009, pp. 217-228.
mimic neural algorithms for efficient in- Blade Servers, Proc. 36th Ann. Intl 13. J.T. Pawlowski, Micron Hybrid Mem-
formation processing.33 Symp. Computer Architecture, ACM ory Cube (HMC), HotChips 23, 2011.
While this area is relatively nascent, Press, 2009, pp. 267-278. 14. E. Riedel, G.A. Gibson, and C. Falout-
these opportunities herald a future data- 4. M.K. Qureshi, V. Srinivasan, and J.A. sos, Active Storage for Large-Scale
centric data center that will differ signifi- Rivers, Scalable High Performance Data Mining and Multimedia, Proc.
cantly from current designs. In particular, Main Memory System using Phase- 24rd Intl Conf. Very Large Data Bases,
we believe that the distinction between Change Memory Technology, Morgan Kaufmann, 1998, pp. 62-73.
traditional memory and storage hierar- Proc. 36th Ann. Intl Symp. Com- 15. D. Patterson et al., A Case for Intelli-
chies will be blurred and traditional wis- puter Architecture, ACM Press, gent DRAM: IRAM, IEEE Micro,
dom on the size and depth of data 2009, pp. 24-33. vol. 17, no. 2, 1997, pp. 33-44.
hierarchies will be revisited. We also be- 5. B.C. Lee et al., Phase Change Tech- 16. T. Sunaga et al., A Processor in Memory
lieve that computing will be pervasively nology and the Future of Main Mem- Chip for Massively Parallel Embedded
embedded within the system design ory, IEEE Micro, vol. 30, no. 1, Applications, IEEE J. Solid State Cir-
colocated with data storage and data 2010, pp. 131-141. cuits, Oct. 1996, pp. 1556-1559.
communication, and traditional general- 6. P. Ranganathan, From Microproces- 17. H. Esmaeilzadeh et al., Dark Silicon
purpose server-class processing will be sors to Nanostores: Rethinking Data- and the End of Multicore Scaling,
supplemented with additional, more spe- Centric Systems, Computer, vol. 44, Proc. 38th Ann. Intl Symp. Computer
cialized forms of computation. We also no. 1, 2011, pp. 39-48. Architecture, ACM, 2011, pp. 365-376.
anticipate a software stack significantly 7. X. Wu et al., Hybrid Cache Architec- 18. N. Hardavellas et al., Toward Dark
redesigned to eliminate the inefficiencies ture with Disparate Memory Technol- Silicon in Servers, IEEE Micro,
in current solutions, with new byte- ogies, Proc. 36th Ann. Intl Symp. vol. 31, no. 4, 2011, pp. 6-15.
addressable persistent stores, and new Computer Architecture, ACM Press, 19. A.A. Chien, 10  10: Taming Hetero-
algorithms matched with the advances 2009, pp. 34-45. geneity for General-Purpose Architec-
in the hardware and software architec- 8. C.W. Smullen et al., Relaxing Non- ture, Proc. 2nd Workshop New
ture. In combination, the advances in Volatility for Fast and Energy-Efficient Directions in Computer Architecture,
technology, hardware architecture, soft- STT-RAM Caches, Proc. 2011 IEEE 2011, http://ndca2.saclay.inria.fr/papers/
ware systems, and higher-level algo- 17th Intl Symp. High Performance chien.pdf.
rithms will enable better, faster, cheaper Computer Architecture, IEEE Press, 20. G. Venkatesh et al., Conservation
data-centric computing, which in turn 2011, pp. 50-61. Cores: Reducing the Energy of
can enable new applications to operate 9. J. Condit et al., Better I/O Through Mature Computations, Proc. 15th
on larger volumes of data to extract better Byte-Addressable, Persistent Memory, Intl Conf. Architectural Support for
insights and enable greater automation. Proc. ACM SIGOPS 22nd Symp. Oper- Programming Languages and Operat-
This future looks exciting, but our dis- ating Systems, ACM Press, 2009, ing Systems, ACM Press, 2010,
cussion only scratches the surface of pp. 133-146. pp 205-218.
what is possible. Overall, we believe 10. K.T. Lim et al., Understanding and 21. V. Govindaraju, C.-H. Ho, and K. San-
that the broad area of data-centric data Designing New Server Architectures karalingam, Dynamically Specialized
centers offers a rich opportunity for for Emerging Warehouse-Computing Datapaths for Energy Efficient Com-
more innovation from the broader com- Environments, Proc. 35th Ann. Intl puting, Proc. IEEE 17th Intl Conf.
munity, and we hope that this column Symp. Computer Architecture, IEEE High Performance Computer Architec-
helps fuel additional thinking in this im- CS Press, 2008, pp. 315-326. ture, IEEE Press, 2011, pp. 503-514.
portant area. 11. D.G. Andersen et al., FAWN: A Fast 22. J. Ousterhout and P. Agrawal et al.,
Array of Wimpy Nodes, Proc. ACM The Case for RAMCloud, Comm.
............................................................ SIGOPS 22nd Symp. Operating Sys- ACM, vol. 54, no. 7, 2011, pp. 121-130.
References tems Principles, ACM Press, 2009, 23. L.A. Barroso, Warehouse-Scale
1. J.E. Short, R.E. Bohn, and C. Baru, pp. 1-14. Computing: Entering the Teenage
How Much Information 2010: Report 12. A.M. Caulfield, L.M. Grupp, and Decade, Federated Computing Re-
on Enterprise Server Information, S. Swanson, Gordon: Using Flash search Conf., 2011.
2011; http://hmi.ucsd.edu/pdf/HMI_ Memory to Build Fast, Power-Efficient 24. H. Volos, A.J. Tack, and M.M. Swift,
2010_EnterpriseReport_Jan_2011.pdf. Clusters for Data-Intensive Applica- Mnemosyne: Lightweight Persistent
2. Data, Data Everywhere, The Econ- tions, Proc. 14th Intl Conf. Architec- Memory, Proc. 16th Intl Conf. Archi-
omist, 25 Feb. 2010. tural Support for Programming tectural Support for Programming
....................................................................

JANUARY/FEBRUARY 2012 69
[3B2-9] mmi2012010066.3d 18/1/012 16:14 Page 70

..........................................................................................................................................................................................................................
PROLEGOMENA

Languages and Operating Systems, Research, 2011; http://www.cidrdb.org/ to Explore the Electronic Brain,
ACM Press, 2011, pp. 91-104. cidr2011/Papers/CIDR11_Paper3.pdf. IEEE Computer, vol. 44, no. 2, 2011,
25. J. Coburn et al., NV-Heaps: Making 29. M. Wu and W. Zwaenepoel, eNVy: A pp. 21-28.
Persistent Objects Fast and Safe NonVolatile, Main Memory Storage
with Next-Generation, NonVolatile System, Proc. 6th Intl Conf. Archi- Parthasarathy Ranganathan is a Fellow at
Memories, Proc. 16th Intl Conf. Ar- tectural Support for Programming Lan- HP Labs. His research interests include
chitectural Support for Programming guages and Operating Systems, ACM system architecture and energy-
Languages and Operating Systems, Press, 1994, pp. 86-97. efficient design. Ranganathan received
ACM Press, 2011, pp. 105-118. 30. P.M. Chen et al., The Rio File Cache: his PhD in electrical and computer
26. D.A. Holland and M.I. Seltzer, Multi- Surviving Operating System Crashes, engineering from Rice University. He is
core OSes: Looking Forward from Proc. 7th Intl Conf. Architectural Sup- also an IEEE Fellow.
1991, er, 2011, Proc. 13th USENIX port for Programming Languages and
Jichuan Chang is a senior research
Conf. Hot Topics in Operating Sys- Operating Systems, ACM Press,
scientist at HP Labs. His research
tems, USENIX Assoc., 2011, p. 33. 1996, pp. 74-83.
interests include computer system ar-
27. C.J. Rossbach, J. Currey, and Emmett 31. S. Venkataraman et al., Consistent and
chitecture and memory systems. Chang
Witchel, Operating Systems Must Durable Data Structures for NonVolatile
received his PhD in computer sciences
Support GPU Abstractions, Proc. Byte-Addressable Memory, Proc. 9th
from the University of Wisconsin-Madi-
13th USENIX Conf. Hot Topics in USENIX Conf. File and Storage Technol-
son. He is a senior member of IEEE.
Operating Systems, USENIX Assoc., ogies, USENIX Assoc., 2011, p. 5.
2011, p. 32. 32. M. Athanassoulis et al., Flash in a Direct questions or comments about
28. S. Chen, P.B. Gibbons, and S. Nath, DBMS: Where and How? IEEE Data this article to Parthasarathy Ranganathan
Rethinking Database Algorithms for Eng. Bull., vol. 33, no. 4, 2010, pp. 28-34. at partha.ranganathan@hp.com or to
Phase Change Memory, Proc. 5th Bi- 33. G. Snider et al., From Synapses to Jichuan Chang at jichuan.chang@
ennial Conf. Innovative Data Systems Circuitry: Using Memristive Memory hp.com.

....................................................................

70 IEEE MICRO

Potrebbero piacerti anche