Sei sulla pagina 1di 5

27/10/2010 HPC in the Cloud: SC10 Disruptive Tec…

SAVE THIS | EMAIL THIS | Close

October 26, 2010

SC10 Disruptive Technology Preview: The First Cloud Portal to


“R” and Beyond
Nicole Hemsoth

At each annual Supercomputing Conference a handful of innovations are selected as the year’s “disruptive
technologies” that are most likely to revolutionize high-performance computing. These are described as “drastic
innovations in current practices&hellipthat have the potential to completely transform” the landscape.

At this year's event in New Orleans, the focus will be on “new computing architectures and interfaces that will
significantly impact the high-performance computing field throughout the next five to 15 years,” a focus that
is reflected in the list of disruptive exhibitors who were selected by an SC committee.

Another “qualification” of those selected innovations is that they cannot have already emerged into the landscape
in any meaningful way—that they sit on the bleeding edge waiting for impetus to burst forth and cause a paradigm
shift.

At the edge of this potential sea-change in HPC—and included on that


SC10 list of innovations this year is a one-man show run by Karim Chine of
his newly-minted company, Cloud Era, Ltd.

Chine’s opportunity to showcase his “Google Docs-like portal for scientific


computing in the cloud” could mean that his three-year effort, which he
bootstrapped after he was unable to secure the funding needed for his
research and development process, could garner some significant interest
and make what this self-described “social entrepreneur” calls a real,
universal impact in the broad field of large-scale data analysis.

Chine’s goal when he began the project after leaving academia was to bring the R language to the cloud and
deliver it seamlessly to users who can share infrastructure and collaborate in real-time with a wide range of
documents and computational tools. Or at least that's the Reader's Digest version--the actual technology and
processes that create the experience for technical users goes far beyond these elements in terms of complexity
and what is possible.

From the outset, Chine saw the inherent value of R as a ubiquitous tool but also recognized that there are a
number of embedded challenges to using the language in terms of memory and compute capabilities being
stretched to the limit. On the other end of the spectrum, he also saw how he could carry over lessons from social
www.printthis.clickability.com/pt/cpt?a… 1/5
27/10/2010 HPC in the Cloud: SC10 Disruptive Tec…
networks. Chine notes that part of what makes his Elastic-R project innovative--disruptive, even--is that users
can move beyond sharing static information as they would on social networking platform and instead have a
scientific network where real-time information sharing would be at the core of the communities.

The R Language Coming to a Browser Near You

It's far too simple to suggest that what makes the platform unique or disruptive is the capacity for real-time
resource and information-sharing. At the core of this innovation is the enhanced ability for researchers to use
R, Scilab, and other tools in a new way--on the "infinite" resources provided by the cloud.

Many will agree that the R language is the lingua franca of data analysis—it’s the standard for nearly all statistics
students in every major university and has a user base that some estimate is well over one million. In Chine’s
view, the beauty of the R language, which is an open source implementation of S, lies “not just in statistics, not
just in open source, it’s become the environment where people share scientific artifacts—where people
contribute and access powerful tools for working with data.”

Although Chine discussed at length some of the benefits of the R language for scientists and researchers, he noted
that there are some significant limitations to the language, particularly in the arena of software architecture and the
R’s distinct lack of ability to optimize memory usage. However, the memory and architecture problems can be
addressed by delivering R via cloud-based resources like EC2—in an environment where a user is no longer
constrained by compute or memory and where inexpensive machine instances with 70 GB of RAM can be called
into action in a few moments.

The idea of a “few moments” to get an instance up and running might strike some newer EC2 users as a little far-
fetched, which leads to another issue that Elastic-R might be able to solve. One of the goals Chine had in mind
was not only to provide a resource that would make R available via a web browser on a machine like an iPad,
for instance, which has limited compute capacity, but to deliver the resource in a way that is intuitive and takes
away from potential complexity in accessing remote infrastructure.

Elastic-R enables scientists, educators and students to use cloud resources seamlessly, work with R engines and
use their full capabilities from within any standard web browser. For example, they can collaborate in real time,
create, share and reuse machines, sessions, data functions, spreadsheets, dashboards, etc.”

Elastic-R is also an applications platform that allows anyone to assemble statistical methods and data with
interactive user interfaces for the end user. These interfaces and dashboards are created visually and are
automatically published and delivered as simple web applications.”

For Chine, the revolutionary or disruptive nature of Elastic-R lies in its user-friendliness, something that few
people might say about the static R language. He states that offering a platform on top of R that is easy to work
with in any browser allows people to access infrastructure without being computer savvy or with any real specific
training. In essence, in three minutes you can have simple access to machines on EC2 that will allow you to do
anything you want with large-scale data.

Even more disruptive, however, is the fact that users can hook in other scientific computing tools like Scilab or
MATLAB thus making it a universal platform that is open to change and adds the possibility of throwing in
additional tools to enhance research. They can then eliminate the problems involved with having their data in
disparate formats that can complicate sharing by porting their results directly into standard Microsoft Office tools

www.printthis.clickability.com/pt/cpt?a… 2/5
27/10/2010 HPC in the Cloud: SC10 Disruptive Tec…
that can be shared and edited in real time via the web interface.

Taking R Beyond the Public Cloud

At the moment the resource can only be deployed using Amazon EC2 but this is simply a matter of how far
Chine has traveled with his experiences—in theory, this can run on any resource. For instance, when he first
began rolling out the prototype version of Elastic-R, he did so on the National Grid Services in the U.K. using a
standard cluster, which would be possible on any other resource he might have selected.

The point is that what Chine has created is agnostic to the hardware and operating system, so users can connect
to computational engines via their browsers, thus enabling to work with large-scale data that you don’t move, but
can share with others for collaboration in real-time.

As Chine stated, “What’s wonderful about Amazon is that they already deliver the most significant public cloud
of the moment, but also that they’ve blurred the frontier between normal computing and HPC&hellipFor the end
user or interaction design perspective there’s no borderline between general computing and high-performance
computing now.”

There are a range of capabilities that Elastic-R that are almost too numerous to mention in a relatively short
article. In fact, this seems to be one of the reasons why this is such a disruptive technology; it’s multi-layered in its
potential usefulness. Scientists and researchers can open mainstream computing environments beyond R (Scilab,
SciPy, Sage, etc.) can issue commands to the remote R engne, install and deploy new packages, and easily run
computationally-intensive algorithms virtually that are managed through the simple interface, then share all of it,
including the computational resources themselves.

The following is from a slide out of the following deck (the presentation, which is the pptx file provides a more in-
depth overview of the layers of the Elastic-R portal and what it provides) showing the onion-like way users can
visualize their access to resources and tools.

www.printthis.clickability.com/pt/cpt?a… 3/5
27/10/2010 HPC in the Cloud: SC10 Disruptive Tec…

During an interview with Karim Chine, I was granted access to the interface to watch how collaboration happens
and how resources are secured. Without much experience at all, it was possible to understand intuitively
understand exactly what was needed to get my job running, to indentify where the results were, who I could
share them with and how at the exact same moment I updated a spreadsheet, my partner on the other side of the
ocean could see my changes in real time. Real-time. There was no delay. The moment he replaced a “5” with a
“6” on his end I saw it on my own browser screen.

This is big news for the future of scientific collaboration and computation using remote resources.

A Business Model Still in the Making

Chine’s goals are multi-layered and go far beyond making R more accessible to greater numbers of researchers
via the cloud—he hopes to create a “Facebook” for scientists and statisticians where they can share and
collaborate with big data in real time using a simple interface that they can build applications on top of and add or
shed layers of computational tools and resources seamlessly.

As a social entrepreneur, Chine notes that this interface, as it develops, means that researchers in developing
countries without access to high-performance computing resources can now easily create machine instances for
small sums and even if those prices are too high, they can also share infrastructure with collaborating participants.

In essence, what this means is that there is not only an economy of information sharing involved with this
disruptive innovation—there is an economic angle that allows researchers to extend their infrastructure to those
across the world easily and in only a few moments.

As a business model, however, there are some issues that Chine admits he is still working to resolve. On the one
hand, he sees the possibility of involving those who make scientific tools available, including The MathWorks,
partnering in a revenue-sharing sense once those tools are integrated. He also sees value for supercomputing

www.printthis.clickability.com/pt/cpt?a… 4/5
27/10/2010 HPC in the Cloud: SC10 Disruptive Tec…
centers that might want to provide a simpler and more streamlined way to access and use high-performance
computing infrastructure.

For now, however, he admits that he is just waiting to see how useful this will be as he extends his user base,
which is currently only at 140 members—all of whom he knows personally. He will be announcing the
technology just before SC10 as publicly available.

While the cloud can open the doors to enhanced collaboration and resource sharing as well as providing the tools
researchers need, there is a remaining need for software that creates a sturdy bridge between the tools for
scientific computation and the cloud, which is where Elastic-R fits into the picture.

Coupled with the open, collaborative nature of the project, which is driven by its social entrepreneur founder and
creator, it will be thrilling indeed to watch how the community receives, uses, then builds on this disruptive
innovation.

Links referenced within this article

disruptive exhibitors
http://sc10.supercomputing.org/?pg=disrupttech.html
deck
http://w w w .elasticr.net/doc/
pptx file
http://w w w .elasticr.net/doc/

Find this article at:


http://w w w .hpcinthecloud.com/features/SC10-Disruptive-Technology-Preview --The-First-Cloud-Portal-to-R-and-Beyond-105776458.html?
view All=y

SAVE THIS | EMAIL THIS | Close

Uncheck the box to remove the list of links referenced in the article.

www.printthis.clickability.com/pt/cpt?a… 5/5

Potrebbero piacerti anche