Sei sulla pagina 1di 12

Where Cloud Meets Big Data

Managing big data in the cloud can be an overwhelming responsibility for IT departments.
Its important they put the right framework in placeor pawn off that work on the right provider.

EDITORS NOTE

KEEPING UP WITH
BIG DATA AS A SERVICE

FIND YOUR CLOUD BIG


DATA PLATFORM MATCH

AND HADOOP
FOR ALLOR NOT

EDITORS
NOTE

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Big Data Decision Time

Today, an application without data analytics is like a car without a steering wheel. It
will go, but theres no controlling its direction.
This handbook explores the many emerging
and evolving Web- and cloud-based technologies for controlling and using data inside
applications. The articles look at full-featured
cloud suites, the popular Hadoop programming
framework and other business intelligence
tools that can be embedded into applications.
Our lead story, by news writer Joel Shore,
shares advice on how software pros can use big
data as a service (BDaaS) to accommodate the
expectations of executives who see the capabilities of cloud-based analytics but cant
understand the challenge of integrating with
enterprise systems.BDaaS delivers a platform
and suite of tools that can speed up builds of
analytics applications.
Is BDaaS the right data analytics development platform for your organization? Find

WHERE CLOUD MEETS BIG DATA

expert guidance in our second story, in which


consultant Tom Nolle lays out approaches
ranging from BDaaS to do-it-yourself options
and covers the role databases play in these
decisions.
Nolle questions a key big data tool assumption, that Hadoop fits all situations, inthe final
story. To Hadoop or not to Hadoop, he advises,
depends on such variables as whether data
access is centralized, the level of data distribution performance needs, database practices and
more.
Are you evaluating solutions for embedding
data analytics into applications that we didnt
cover in this handbook? Tell us about your
search and projects, and our resident experts
can help. n
Jan Stafford
Executive Editor
SearchCloudApplications

ANALYTICS

Keeping Up With Big Data as a Service

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

If theres any agreement about big data,


its that so much is coming in so quickly from
many sources in many formats. The speed at
which it all needs to be processed, stored and
analyzed is simply more than most corporate
IT budgets, staffs and infrastructures are able
or willing to handle. We are drowning in data,
yet often find ourselves starved for information. For an increasing number of companies,
getting a grip on the situation means making it
someone elses problem. That someone is a big
data as a service (BDaaS) or a data as a service
(DaaS) provider.
Regardless of how a service is configured and
delivered, discussion of DaaS focuses as much
on analytics as it does on data collection, presenting opportunities and challenges to the
development side.
For architects and developers, cloud-based
big data offerings are a way to accelerate the
time to build analytics applications, said Nik

WHERE CLOUD MEETS BIG DATA

Rouda, an analyst focused on big data and analytics at market research company Enterprise
Strategy Group. Without having to wait for
IT infrastructure and operations teams to provision resources, developers can start immediately on prototyping and then easily roll the
new tools into production when ready.
With the rise in cloud and mobility, business
priorities have become crystal-clear: Grow revenue and transform the customer experience
while reducing costs. Each requires architects
and developers to balance traditional values,
such as security and cost effectiveness, with
the need for speed and agility.
Architects must figure out how to accommodate the sky-high expectations of digital
executives who have seen the capabilities of
analytics in the cloud and yet do not understand why it is so difficult to integrate with
enterprise systems, said Brian Hopkins, an
analyst for Forrester Research. Excuses and

ANALYTICS

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

finger pointing wont work; those that fail will


become irrelevant. This makes emerging Agile,
DevOps and data science practices a critical
part of the emerging digital architecture.
Every company is looking to do more with its
data to stay ahead of competitors. Moving data
to the cloud makes access and analysis easier
for everyone. Your customers, employees and
apps all live in the cloud, so thats where your
data needs to be, Rouda said. Its natural to
bring the analytics to the data; you dont bring
the data to the analytics. BDaaS, or DaaS [is]
particularly good at doing this.
Jim Comfort, general manager of cloud services at IBM, agreed. Analytics is a key driver
for turning to DaaS. Its one thing to simply
store data in the cloud, but its the analytics
in the cloud that make data useful. If you need
one, two or 20 different analytic approaches,
you can easily do all of that with the flexibility
and agility that a cloud services environment
offers, he said.
The numbers back up Roudas and Comforts
assertions. Data management, typified by the
migration of databases from on-premises storage into the cloud, is the top IT priority for

WHERE CLOUD MEETS BIG DATA

this year among 26% of organizations polled by


the Enterprise Strategy Group for its 2015 IT
Spending Intentions Survey. That ranks second

Your customers, employees and


apps all live in the cloud, so thats
where your data needs to be.
NIK ROUDA, analyst at Enterprise
Strategy Group

only to security initiatives, which was cited by


34%. In that same group, 66% plan to boost
spending on cloud services in 2015 compared
with last year.
Forresters Hopkins takes an alternative view.
The truth is that the data on which you do
your analytics is usually reasonably sized, only
a small subset. Move just that to the cloud and
use DaaS to do the analytics there, he said. Big
data in the cloud is not yet affordable; its better to keeps years and years of historical data
on-premises in a hybrid configuration.
Regardless of where data resides, there is little doubt that IT is finding it increasingly difficult to keep up with demand. Thats not

ANALYTICS

HOME
EDITORS NOTE
KEEPING UP

surprising, given the pace at which data is created. In 2013, Norways The Foundation for
Scientific and Industrial Research published
a widely quoted study that found 90% of the
worlds data had been created in the last two
years. In 2015, its not unreasonable to surmise
the percentage has edged higher. IBM itself
says that 2.5 quintillion bytes of data are created every day.

WITH BIG DATA


AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

WHERE CLOUD MEETS BIG DATA

Data warehouse providers cant live up to


their former promises anymore. Those infrastructures lacked agility, and you needed to
declare everything upfront, including the [database] schema and the amount of data to be
stored, Comfort said. Thats no longer good
enough. An instantly scalable, quickly implemented, cost-effective DaaS is the answer, he
added. Joel Shore

DATABASES

Find Your Cloud Big Data Platform Match

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Users and cloud providers alike are focusing on the intersection of big data and the
cloud, planning applications and service offerings to exploit the technologies. To address
this intersection and choose thebest cloud big
data platform, developers need to decide on a
database model, select cloud database services
or cloud database platforms and review the features of each platform against their companys
needs.
There are three popular models for big
data: distributed MapReduce, popularized by
Hadoop; NoSQL, used for nonrelational, nontabular storage; and SQL relational systems for
relational tabular storage of structured data.
You can use all three in the cloud, so in most
cases database design and usage concerns will

determine the model choice. After identifying a


database model, you can explore cloud options
for the model selected.
Most business transactions are best stored
and accessed in a relational database management system, where SQL queries and tabular
summarization can be easily supported. Enterprise users and database architects are most
likely to be familiar with this model, and a good
rule of thumb is to go for SQL and relational
until you can prove another option is better.
The most common deterrent to using SQL
is that the data is object-structuredrather
than tabular. Object data collects information as a set of properties that may be freeform in the object. If you cant visualize data
as a set of tables with fixed fields and valuable

There are three popular models for big data: distributed MapReduce,
NoSQL and SQL relational systems. All three can be used in the cloud.

WHERE CLOUD MEETS BIG DATA

DATABASES

field-to-field relationships, then a SQL and


relational system may be difficult to adopt, and
other options may be better.

HADOOP AND NOSQL


HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Both the Hadoop and NoSQL options are


easier to adapt to nonstructured data. Hadoop
and NoSQL can be used for applications where
unstructured data is stored in clusters distributed on a network, so the choice between
the two comes down to object structure. If the
database stores information about specific,
identified things, then NoSQL is likely best.
Data that has no natural structure, like freeform text, is better stored using Hadoop.
Note that you generally can query SQL,
NoSQL and Hadoop databases using SQL.
The latter two may require an overlay product, and the lack of tabular organization may
make query processing more time-consuming.
If you expect most database activity to be in
SQL form, you probably have tabular data and
should be considering a relational model.
The second point to consider is whether
to use a database package from a cloud

WHERE CLOUD MEETS BIG DATA

provider or host your own database in the


cloud.Most people are familiar with Amazon, Rackspace,MicrosoftandGoogle,but
lesser-known providers JoyentandQubolealso
have strong big data credentials. Additionally,
Hadoop is usually available from major cloud
providers.

DO IT YOURSELF

Another option is to host your own big data


application in the cloud using big data software
and infrastructure as a service or platform as a
service.
A do-it-yourself approach can offer advantages. It widens your options for cloud hosting
because not all cloud providers will support big
data as a service. You can use multiple public
clouds or switch between cloud providers with
greater ease. And often you can create hybrid
big data applications more easily if you adopt
the same big data software in the cloud and onpremises. The disadvantage, according to cloud
buyers, is that creating in-cloud big data with
your own platform tools is more complicated
and sometimes more costly.

DATABASES

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Obviously, the best platforms for cloud big


data depend on your database model. Top-rated
Hadoop options includeApache Hadoop, SAPs
HANA and Hadoopcombination,Hortonworks,Hadaptand VMwaresCloud Foundry,
as well as services provided by IBM, Microsoft
and Oracle. For NoSQL, considerApache Cassandra,Apache HbaseorMongoDB. IBM also
offers NoSQL for the cloud. Make sure that
your final choice supports the level of big data
cloud scaling you expect.
SQL big data in the cloud is most often supported by extending your on-premises SQL
vendor offering. IBM, Oracle and Microsoft all
offer SQL thats suitable, with some tuning,
for big data cloud deployment. HPs Haven is a
general big data architecture for the cloud that
embraces both structured and unstructured
data and supports SQL queries.

CRITICAL NEEDS

Its important to understand your needs and


evaluate how each platform supports those
needs. You may need to run tests to determine

WHERE CLOUD MEETS BIG DATA

whether a given big data option is efficient


for your specific mix of update and access. Be
particularly careful about SQL queries against
non-SQL databases. Analytics that require
extensive use of SQL can create major performance issues even with relational systems, and
more so with other database models. Creative
database design and careful use of JOINed
databases may make things more efficient.
You also should ensure that distributed big
data clusters can be accessed efficiently for
combined queries. This can be complicated
with cloud-hosted data because users have only
limited control over how the data is distributed. Testing to determine optimum data distribution strategies and a contract to ensure,
generally, that data stays within those guidelines is critical.
Cloud big data hosting has considerable variables, so be prepared to gather a lot of operating data on quality of experience to ensure that
workers are getting what they need and that
costs are managed. Otherwise youll end up
with something too costly to fix and too slow
to accept. Tom Nolle

TOOLS

And Hadoop for Allor Not

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Apache Hadoop has long been the focus


of cloud-big data thinking, but there are plenty
of refinements to consider in Hadoop planning,
and many big data cloud applications arent
suitable for Hadoop. Developers should ask
what the big data storage paradigm will be and
if it matches Hadoops capabilities, optimizes
their database planning for cloud access, and
tracks changes in data storage or access policies
that could indicate a change is needed.
Hadoop is an open source implementation
of a Google concept calledMapReduce. It is
designed to support the storage and querying
of databases distributed across multiple network-connected compute clusters. The basic
notion is to allow a single query to find and
collect results from all the cluster members.
This model is suitable for Googles model of
search support.
The value of Hadoop is that distributed
data is subject to collective inquiry. Most

WHERE CLOUD MEETS BIG DATA

enterprises collect information in centralized


databases and also create separate abstractions
or aggregations of this data for better access.
Many vendors, including IBM, recognize this
trend and dont lead their cloud big data initiatives with the assumption that Hadoop is
the choice technology. CIOs also agree that its
rarely wise to use Hadoop on centralized data
or to distribute data in the cloud simply to be
Hadoop compatible.
Hadoop is ideal where data is naturally separated, not just within a data center but across
multiple data centers. If thats not the case for
your data, then Hadoop isnt likely the best
optioneven if youre moving applications to
the cloud.
Other data storage considerations include:

Do you routinely query distributed data


as though it were centralized? If your data
access tends to be directed toward specific

TOOLS

data clusters, providing for overall query


capability may have limited value.

HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA
AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

Are any or all of your query applications performance sensitive? Hadoop querying is not
as fast as other options for big data. This is
particularly true if youre using Hadoops
optional SQL capability.
Do you create aggregate databases with summary data to support high-level analytics?
If so, these databases will likely combine
data from multiple data clusters and reduce
your need to look at the cluster data directly.
However, Hadoop might be helpful here to
support the aggregation of information.

The ideal Hadoop environment is one


where large data volumes are collected and used
locally but must also be accessed by analytics
applications that deal with raw data rather than
summary-level information. If this isnt your
situation, other options may be better.
Hadoop is good at confining mass data
access to the clusters, but you can accomplish something similar by sending queries to

10

WHERE CLOUD MEETS BIG DATA

localrelational systems at each location and


then joining the results. Another strategy
for avoiding data access issues is to create
summary databases for analytics that dont

Avoid passing large volumes


of data across the cloud boundary. Store data in the cloud or
on-premises, and just move
query results and summarized
databases.
require real-time information and are specialized and small enough to be hosted in the cloud
at modest cost or moved into the cloud ad hoc
as needed.
The second point in planning for cloud and
big data is to remember that true cloud applications are very different from legacy applications. This must be the primary design
consideration. Your cloud usage, present and
planned, will have a major effect on your big
data design, enough to create major problems if
you make the wrong choice.
Users vary significantly in how they plan to

TOOLS

HOME
EDITORS NOTE
KEEPING UP

use the cloud. Some expect to host everything


there, some to share or hybridize, and some
to use the cloud for failover or cloud bursting.
Data access is a part of every application. The
primary issue is to avoid passing large volumes
of data across the cloud boundary. Store data in
the cloud or on-premises and try to pass query
results and summarized databases, not large
quantities of raw data.

WITH BIG DATA


AS A SERVICE
FIND YOUR CLOUD
BIG DATA PLATFORM
MATCH
AND HADOOP
FOR ALLOR NOT

11

WHERE CLOUD MEETS BIG DATA

The final issue with big data in the cloud is


the increased risk that the combination presents. Cloud computing is evolving, and so is big
data. Application design is changing to optimize cloud utility, and the notion of data and
databases is transforming with mass-collection
networks like theInternet of Things. These
changes will affect both cloud and big data
plans. Tom Nolle

ABOUT
THE
AUTHORS

TOM NOLLE is the president of CIMI Corp., a consultancy

specializing in telecommunications and data communications since 1982. He writes for many TechTarget websites.Read his blog or email him at tnolle@cimicorp.com.
JOEL SHORE is a technology journalist, author and editor
HOME
EDITORS NOTE
KEEPING UP
WITH BIG DATA

with nearly 30 years of experience.He is the co-founder


and longtime director of the Computer Reseller News Test
Center. He is a news writer for SearchCloudApplications
and SearchAWS. Email him at jshore@techtarget.com.

Where Cloud Meets Big Data is a


SearchCloudApplications.com e-publication.
Jason Sparapani | Managing Editor
Moriah Sargent | Associate Managing Editor
Jan Stafford | Executive Editor

AS A SERVICE

Brein Matturro | Site Managing Editor

FIND YOUR CLOUD

Linda Koury | Director of Online Design

BIG DATA PLATFORM

Neva Maniscalco | Graphic Designer

MATCH

Doug Olender | Publisher


dolender@techtarget.com

AND HADOOP
FOR ALLOR NOT

Annie Matthews | Director of Sales


amatthews@techtarget.com
TechTarget
275 Grove Street, Newton, MA 02466
www.techtarget.com
2015 TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the
publisher. TechTarget reprints are available through The YGS Group.

STAY CONNECTED!
Follow @SearchCloudApps today.

12

WHERE CLOUD MEETS BIG DATA

About TechTarget: TechTarget publishes media for information technology


professionals. More than 100 focused websites enable quick access to a deep
store of news, advice and analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access to
independent expert commentary and advice. At IT Knowledge Exchange, our
social community, you can get advice and share solutions with peers and experts.
COVER ART: FOTOLIA

Potrebbero piacerti anche