Sei sulla pagina 1di 9

Oracle

PAGE 20
ROLE OF DATA
INTEGRATION IN
UNLOCKING THE VALUE
OF BIG DATA SOLUTIONS

Kore Technologies
PAGE 22
REAL-TIME INTEGRATION—
THE FUTURE IS NOW

Denodo
PAGE 23
EMBRACE DIGITAL
DISRUPTION WITH
DENODO PLATFORM
IN THE CLOUD

SnapLogic
PAGE 24
10 NEW REQUIREMENTS
FOR MODERN DATA
INTEGRATION

Cask
PAGE 25

DATA
DATA INTEGRATION: A
STEPPING STONE TO
MODERN DATA APPS

INTEGRATION
MODERN
ENTERPRISE
Best Practices Series
18 AUGUST/SEPTEMBER 2016 | DBTA

Data Integration
for the
Modern Enterprise:

CLOUD
Shifts the Balance
Best Practices Series
The rise of cloud computing has changed the whole
concept of data integration. Previously, it was an
information technology department concern, with
IT and data staff sweating it out with manual scripts,
product connectors, and middleware brokers in efforts
to cobble together relevant applications, or surface data
trapped in legacy system silos to deliver it to modern
front-end interfaces. Top-level efforts also consisted of
endeavors to bring data into a common place through
master data management, or organizing information
through ontologies.
The blood, sweat, and tears that go in identifying, managing, and leveraging expected to triple over the next 24
into enterprise data integrations may not the way data flows through IT systems. months. There will be a significant
go away anytime soon, but lately, it has A recent survey of 300 DBAs and IT amount of enterprise data shifting to the
become easier to make data from any and professionals, conducted by Unisphere cloud over the next 24 months as well,
all sources more available to people and Research, a division of Information as enterprises rethink data management
applications that need it. The cloud­—and Today, Inc., finds growing interest in in the cloud. Seventy-three percent of
more specifically, database as a service DBaaS as a viable approach to serving managers and professionals expect to be
(DBaaS)—has shifted the challenge of their enterprises’ needs for greater agility using DBaaS within their enterprises by
data integration. and faster time to market with cloud that time, versus 27% at the present time.
Clouds and DBaaS offerings are computing. Many of the early hurdles What does it take to construct and
gaining traction as an online means to in delivering enterprise capabilities for sustain a viable DBaaS strategy? Here
manage and process data. The cloud security and availability in the cloud are some considerations:
offers a major venue for big data become more evident with the reliance
solutions, since they are faster to deploy on hybrid cloud approaches and the SHOW THE BUSINESS
and easier to operate and scale than with need to move enterprise applications to POTENTIAL NEW WAYS DATA
typical on-premises systems. the cloud and back on-premises based CAN BE LEVERAGED.
At the same time, the rise of cloud and on the business requirements of the Moving to DBaaS is more than
DBaaS as an information management organization, their legacy investments, simply making data more accessible,
environment has shifted the balance and regulatory requirements (“Database it also opens new paths to innovation.
of responsibility for enterprise data as a Service Enters the Enterprise Data is a tool, a means, to better engage
integration away from the confines of Mainstream: 2016 IOUG Survey on customers and better understand markets.
data centers to the enterprise as a whole. Database Cloud,” April 2016). Plus, as new ideas and requirements
Suddenly, the entire business has a stake DBaaS is taking off, with adoption
AUGUST/SEPTEMBER 2016 | DBTA 19

arise, DBaaS—especially if delivered by priority as they enter the cloud space. Azure Storage, and
a cloud provider—serves as a testbed Google Cloud offer
to help accelerate innovation and TACKLE THE DATA SECURITY virtually unlimited
experimentation, since cloud providers ISSUE HEAD-ON, AND AS storage resources
will likely have all required features and EARLY AS POSSIBLE. that can serve as
services in place. Shifting more activities Data security isn’t just about securing total storage sites or as
to cloud providers or shared service data from hackers, but also entails access bursting services for spikes
environments frees up enterprises and control, as well as avoidance of any in enterprise workloads. On-premises
their IT staffs to provide higher-level potential for third parties to mishandle solutions—from open source
support to the business. the data. Half of the managers and frameworks to commodity hardware—
professionals in the Unisphere-IOUG also provide scalable storage solutions.
DEVELOP A DATA survey indicate that security and privacy
GOVERNANCE STRATEGY. concerns are the greatest inhibitors to DEPLOY DBAAS AND CLOUD
Data governance has long been a their cloud initiatives. Data ownership WHERE IT FINANCIALLY
challenge for enterprises, and cloud and retention follow closely behind as the MAKES SENSE.
or DBaaS doesn’t make things any second-ranked concern. Often, trusting Not all business cases may be suitable
easier. Essential concerns such as data outside cloud providers with sensitive or for DBaaS or cloud. To determine the
security, quality, and relevance will mission-critical corporate data is seen cost/benefit of cloud sites, enterprises
need to be dealt with at the enterprise as risky, not only in terms of potential need to weigh the traditional on-premises
level. In addition, there is a need to breaches, but also in terms of the potential costs of maintaining hardware, systems,
wrap governance around disparate data need for a relationship between a cloud networks, and storage, versus that
sources, which often were external to provider and consumer to be modified of a shared, multitenant enterprise
enterprises and therefore not under or terminated. The fate of data held by a private cloud, versus subscribing to a
their purview. There are a range of cloud provider may not be clear-cut. public cloud provider. There are also
business requirements that must be management and labor costs that pertain
addressed, from real-time data streaming STANDARDIZE. to project management, development,
to analytics to customer relationship The beauty of cloud and DBaaS is oversight, monitoring, and quality
management. And, organizations must that multiple standards, devices, and management—costs that are likely to
to be able to move large and varied interfaces are supported. However, it’s still be incurred regardless of whether the
datasets at a high velocity through their important that all parts of the enterprise data is managed on-premises or by an
systems—which raises issues for the be on the same page. On a high level, outside cloud provider. Even with the
handlers of this data, such as who owns standards are emerging to help simplify public cloud, there are time and expense
it, who has access to it, and how and cloud-based integration. For example, the requirements related to migration or
where it should be stored. Open Data Protocol (OData) promises integration between various systems, or
to replace the web services standards between cloud and on-premises systems.
GO HYBRID. REST and SOAP to enable greater Other expenses that need to be
From a planning/spending perspective, interoperability between enterprises measured include the cost of software
the future belongs to more hybrid and across the cloud. In many ways, licenses versus subscriptions, or the
approaches. Many organizations continue enterprises are becoming API-driven, costs of upgrades and maintenance
to maintain an abundance of legacy or meaning applications, functions, and versus pay-by-the-sip monthly fees. In
on-premises assets, and this is likely to services can be interconnected, on-the-fly, some cases, it may make sense just to
be the case for some time to come. As to share data as required by the business leave things as they are; in other cases,
long-standing legacy assets, these systems demands of the day. the savings may be significant. But,
have proved their worth and resiliency, ultimately, the true value comes from the
and continue to function well for their ADOPT A SUPPORTIVE IT enhanced opportunities for innovation
organizations. Mainframe systems, in INFRASTRUCTURE. and growth. n
particular, continue to be refreshed by For the private cloud, a robust 
IBM and are capable of supporting the enterprise infrastructure is essential. The  —Joe McKendrick
largest cloud and DBaaS workloads and internal systems—particularly storage—
the latest protocols. As a result of this need to be highly adaptable and elastic
sizable legacy base, the largest percentage for unpredictable workloads. Public
of organizations in the Unisphere/IOUG cloud services also offer compelling
survey, 44%, see the establishment of storage options. Services such as
hybrid cloud as their most important Amazon S3, OpenStack Swift, Microsoft
20 AUGUST/SEPTEMBER 2016 | DBTA Sponsored Content

Role of Data Integration


in Unlocking the Value
of Big Data Solutions
Three years back, you would have had to traditionally been a bottleneck for quickly the actual technological and architectural
follow secret directions to the basement unlocking value from business data. Data advancements in big data technologies that
to find out where your organization’s big preparation solutions use a big data-based enables faster decisioning.
data lab was situated. If lucky, you might processing engine like Spark for a better
even have spotted a white coated data user experience. Data wrangling services HOW FAST IS FAST
scientist. Fast forward to today, and you have brought together traditional ETL ENOUGH FOR ANALYTICS—
would be hard pressed to find a meeting and data discovery features by providing STREAMING ANALYTICS
room where senior executives are not easy data set creations, data enrichments, The short answer is pretty fast.
huddled around a white screen with big publishing and operationalizing (i.e., Real-time data is critical to making
data written in bold red. automating the entire data preparation any business decision more pertinent.
This is not surprising. The big data process) capabilities. Data curation is no Streaming data right into cloud data
ecosystem is forever changing and it plays longer the sole domain of IT experts. This warehouses, which are in turn plugged
such an important role that it figures decoupling of business dependency on with data visualization solutions
in many strategic projects with some of IT resources and expertise has enabled downstream, ensures that the data being
the best brains in the enterprise heading faster data analysis, enabling customers used for business analytics is the latest
them. Compounding the problem is to unleash their business savvy onto data. Successful organizations invest in
that the big data technology landscape unsuspecting data sets. Data wrangling not just data visualization tools, which are
is no longer clearly partitioned into data technologies have also merged the line typically the tip of the data management
management and data visualization. It between data quality through data iceberg. They understand and implement
has expanded to include data acquisition, enrichment; ETL, through easy data a responsive and governed network of
data wrangling, data movement, data acquisition; and data discovery, through data delivery pipeline to capture, filter,
processing, deep data analytics, data intelligent and rich recommendations. aggregate and correlate data before it
visualization and discovery, and finally Leaders in data preparation solutions reaches the analytics platform.
data governance solutions. use advanced machine learning Streaming data first captures data right
Big data integration and big data algorithms and natural language at the source of the data creation. However,
governance touch all these subdivisions of processing to glean and bubble up most of the events that generate the data
data integration. insights from large and varied data happen on business-critical applications
sets that cannot be manually discerned and systems. Real-time data integration
EMERGENCE OF DATA because of the scale and complexity of the technologies should balance the need to
WRANGLING IN THE data sets. capture data in real-time without slowing
BIG DATA ERA This does not mean that the classic data the performance of the systems.
The criteria to invest in big data analyst roles, those of the business analytics Combining big data with streaming data
solutions have seen a marked shift from and ETL experts, are becoming obsolete. capabilities presents infinite possibilities.
technology-based decisions to those based In fact, their roles have evolved to enabling Data from transactional systems that is
on business use cases and benefits. Where enterprise data management where captured in the organization’s database
earlier, purchasing decisions were made they are called upon for more strategic combined with user generated data in
based significantly on addressing technical initiatives like moving the infrastructure to real-time finds many applications in the
limitations of current solutions, today the cloud and ensuring data for Software real world. Businesses with data-driven
organizations have started viewing big data as a Service (SaaS) applications. marketing initiatives can improve customer
technologies as a way to solve business What really differentiates the multitude experience by generating customized
problems or even transform the way of big data projects within organizations promotional offers based on various
they do business. In other words, big data is not just the functional sophistication historic factors such as purchase history
systems have become another data source of the projects, but the speed of delivery combined with real-time data such as
or target that serves key business initiatives. of insights. While democratizing data location, and data from their social media
Data preparation or data wrangling and enabling business users to interact footprint. Today, real-time data integration
has emerged as a leading use case which directly with data are irreplaceable to technologies are used mainly to deliver data
was almost non-existent before this shift. help make data-based decision making to the analytics data warehouse or data
Data preparation and data set creation has ubiquitous within the organization, it is marts. There is an opportunity to embed
Sponsored Content AUGUST/SEPTEMBER 2016 | DBTA 21

analytics in the data streams that deliver the deep data storage provides the best of delivery solution between the cloud and
data to accelerate insights and action. both worlds. Lambda architectures, as on-premise systems and applications.
Streaming analytics that include such combined streaming and deep data Oracle’s Big Data Preparation Cloud
event stream processing shortens the storage architectures are called, provide Service is a next-generation data wrangling
time from data creation to decision a single platform that enables enterprises service that helps business users unlock
drastically. Stream analytics empowers to perform both real-time streaming data quickly from complex business data.
a business audience in any industry analytics, and refine the analytics with Oracle Big Data Preparation Cloud Service
that is looking to create solutions that insights from mining for richer and is built on Apache Spark, combines natural
embrace real-time, instant insight across more complex data recommendations language processing and machine learning
data delivery infrastructures. A good from the deep data storage reservoir. and bridges the line of business—IT divide
when extracting insights from the data
and operationalizing them into enterprise
data integration flows. Meanwhile, the
Oracle Stream Analytics platform provides
a compelling combination of an easy-
to-use visual façade to rapidly create and
dynamically change real-time event stream
processing applications, together with
a comprehensive run-time platform to
manage and execute these solutions. This
tool is business user-friendly and solves the
business dilemmas by completely hiding
and abstracting the underlying technology
platform. Oracle Data Integrator brings
together big data platform portability, the
ability to switch between multiple big data
platforms seamlessly, and powerful data
Oracle offers a full set of products for cloud, big data and on-premise data integration requirements
transformation capabilities to enterprise
stream analytics solution is designed Because big data technologies underpin big data development teams. Oracle Data
to handle large data volumes with the data storage, the cost-to-benefit ratio Integrator provides a unified and common
subsecond latency, while also providing is extremely appealing for organizations interface to build data transformations
a business-friendly, easy-to-use interface. looking to make a difference with their irrespective of the underlying big data
They are designed with drag-and-drop big data investments. technology. This ensures the data
interfaces that help model data streams integration developers can utilize the latest
replicate business models and behaviors. THE ORACLE ADVANTAGE big data platform and language without
Streaming analytics finds great use in Oracle provides a wide range of products having to compromise on productivity
scenarios that rely on very low latency that help with all the moving parts of and with minimal disruption. Oracle’s
business decisioning. Some examples building a differentiated and forward- entire integration platform is governed and
include fraud detection in the financial looking big data integration, management audited by Oracle Metadata Management.
industry to automate stock trading and analytics platform. As part of the data Oracle big data integration offerings are
based on market movement, monitoring integration portfolio, Oracle GoldenGate flexible, robust and complementary to
the vital signs of a patient and setting and Oracle GoldenGate Cloud Service maximize big data investments and unlock
preventative triggers in health care, and ensure real-time data capture and value from these investments now and in
detecting security issues and fraud in streaming from heterogeneous business- the future. n
transportation industries by finding critical transactional systems with minimal
anomalous patterns as they happen to impact to the performance of the source
initiate immediate investigation. systems. Oracle GoldenGate provides ORACLE
Stream analytics combined with the most secure and reliable big data oracle.com/goto/dataintegration
22 AUGUST/SEPTEMBER 2016 | DBTA Sponsored Content

Real-Time Integration—
The Future is Now
With all the cloud-based, mobile and for MultiValue (MV) systems. Kourier without worrying about a user consuming
niche applications, plus the Internet of Integrator streamlines and simplifies all of their resources. Kourier’s REST
Things (IoT), it’s clear that the days of the process of building and testing Gateway makes it easy to:
one-size-fits-all, monolithic applications bi-directional integrations using RESTful • Create policies to manage resource access
are over. It’s now much easier to find a Web Services; the REST Gateway provides – Define the maximum number of
third-party solution to fill a specific need secure, real-time access to MV applications requests per hour, minute and day
or to “go deep” into an area that cannot via REST APIs from outside the firewall. – Define the availability for each
be fulfilled by your primary application. Developers are more productive resource (CRUD)
That’s led to an increase in application working within Kourier’s REST framework • Define users and server/database
integration. However, the growing need because they can focus on the application security
for instantaneous up-to-date data is interface instead of low-level protocol – Associate policies and databases
motivating companies to transition from details such as data validation, resource – Routes requests to server /
traditional batch-oriented techniques security and transaction logging. database
to real-time data integration. Although REST APIs are primarily created via • Visualize performance with the
achieving real-time integration can be specification pages and typically require interactive dashboard
challenging, and accomplished using a minimal programming. Powerful Event – Graph transaction history
variety of technologies, the goal is the Handlers make it easy to leverage existing – View REST resource statistics
same: to transfer accurate, timely data application business logic or add special – Drill down into request headers,
from point A to point B in real-time so instructions within REST resources at parameters and timers
users can make better-informed business- specific timing intervals.
critical decisions. THE REST OF THE STORY
Other developer-friendly features of
The quest for tighter integration
Kourier’s REST framework:
THE NEW REAL-TIME between enterprise applications, third-party
WORLD ORDER
• Automatic data validation solutions and the IoT will drive companies
One technology has emerged as the
• Standard HTTP status code support to use more real-time integration via REST
dominant Web service design model for real-
• Dynamic query parameters to extend and modernize their business
time integration: REST (Representational
• Create REST APIs without coding: operations. Developers will be able to retain
– JSON and XML
State Transfer). Why? RESTful Web services the core functionality and value of their MV
– Query parameter validation
are easier to use and the resource-oriented enterprise applications while extending it as
– Query wildcards
model is more flexible than previous SOAP, needed via integration to other solutions.
– Pagination of large result sets
RPC and WSDL-based interfaces. REST has Kore is helping its partners and clients
– Field limiting
already been adopted by providers such meet this challenge by implementing
– Result filtering and sorting
as Google, Netflix and Twitter and many Kourier Integrator, our award-winning
other enterprise organizations. RESTful
• Automatic transaction logging, enterprise integration and data
history and metrics
architectures and implementations provide management solution, and the Kourier
these characteristics/benefits:
• API versioning REST Gateway. These products facilitate
the building, managing and deployment
• Easy Web integration – uses standard
HTTP methods SECURE, RATED AND of secure, scalable, real-time integrations
MEASURED ACCESS to best-in-class applications via RESTful
• Increased Scalability – stateless
interaction and caching semantics The Kourier REST Gateway is a critical Web Services.
piece of the integration architecture To learn more about our integration
• Reliability – separation between
client, server and data because it’s responsible for providing solutions or to schedule a demon-
secure access to applications from outside stration, please visit our website or call
• Security – via the transport layer
(SSL) and message-level mechanisms the firewall while it monitors, manages and 866-763-KORE (5673). n
measures REST API usage. Connection
• Standard Language – XML and KORE TECHNOLOGIES
JavaScript Object Notation (JSON) pooling is supported for enhanced
performance. Policies can limit (rate) www.koretech.com
SIMPLIFIED REST DEVELOPMENT the maximum number of requests per
Kourier Integrator and Kourier’s minute/hour/day for each user. Gateway
REST Gateway are Kore’s easy-to-use administrators can feel confident about
and versatile REST integration solutions exposing their system to the outside world
Sponsored Content AUGUST/SEPTEMBER 2016 | DBTA 23

Embrace Digital
Disruption with Denodo
Platform in the Cloud
In order to remain competitive in today’s
WHO IS REAPING Mobile Device Protection
digital economy, data-driven businesses
THE BENEFITS? Firm (“The Firm”)
are seeking agile, rapid data integration
Two case studies serve to illustrate the The Firm provides device protection
capabilities that support real-time
benefits of the Denodo Platform for Data and support services for smartphones,
decision making. Data Virtualization
Virtualization in the Cloud. tablets, consumer electronics, appliances,
provides the necessary speed and agility
and satellite receivers. Driving the challenges
by accessing a wide variety of data from
Logitech faced by The Firm were an initiative to move
multiple internal and external data
Logitech is a global provider analytics from on-premises to the cloud on
sources, and transparently combining
of personal computer and tablet the Amazon Web Services (AWS) platform,
(without moving) data to provide
accessories. The company was seeking and an initiative by the CSO organization to
business users with a unified view of all
a cost-effective solution for moving implement enterprise-ready authentication
relevant data for analysis.
its on-premises data to cloud, and to manage data access with appropriate
Data Virtualization is becoming
integrating this data with cloud data security policy.
mainstream for digital enterprises. This
sources. Logitech also faced hurdles In order to enact these initiatives, The
is underscored in the May 2016 Cool
associated with time-to-deliver, Firm needed a solution that would enable a
Vendors in Pervasive Integration, 2016
redundant data and siloed data, business layer as well as a security access layer.
report in which Gartner states that
as well as security issues related to The Firm was building a data lake in the
the “Internet of Things (IoT), digital
unauthorized access to underlying cloud for which a virtual data access layer on
business and logical data warehouse
data sources which raised governance top of the data lake was needed to provision
use cases require data virtualization
concerns. the data for analytics. The security access
approaches for integration to achieve fast
Logitech replaced its on-premises layer was needed to comply with enterprise-
time to value for supporting analytics
data warehouse with Amazon Redshift, wide and legal data access requirements.
and operations.”
and then implemented an LDW The Firm selected the Denodo Platform
Data Virtualization in the Cloud
architecture utilizing Denodo Platform to satisfy both the business layer and security
provides additional advantages,
for Data Virtualization to unify data access layer requirements. The cloud analytic
delivering unparalleled business agility,
across on-premises and cloud data solution uses AWS S3 as the data lake
modernization and cost effectiveness.
sources and provide a single virtual data environment and the Denodo Platform as
The Denodo Platform for Data access layer. Denodo Platform was used the virtual data access and governance layers
Virtualization in the Cloud offers a as a business layer provisioning data to on top of the data lake to provision data to
unique combination of capabilities: all other enterprise analytical tools such the analytical tools such as Oracle Business
4 Self-service data discovery and search as Tableau, Pentaho BA, and other data Intelligence.
empowers users with knowledge about interfaces and Web services. The Denodo Platform enabled The
the data, including data lineage. Denodo’s LDW architecture is the Firm to ensure that the “right people” have
cornerstone for enabling cloud-based access to the “right data”; that governance
4 Real-time intelligence provides users
reporting and analytics, and played a requirements are met across cloud and
with the ability to tap into real-time
critical role in the success of Logitech’s on-premises environments; and was
data for analysis.
Cloud BI and analytics strategy, helping instrumental in the continued success of
4 Multi-structured data support for Logitech achieve faster access to data via The Firm’s cloud analytic strategy. n
wide variety of data sources such as data virtualization without traditional
Hadoop, Spark, NoSQL, Relational ETL; a business access layer for unified DENODO is the leader in Data
and SaaS applications. view of on-premises and cloud data; Virtualization. Please contact us at
4 Hybrid implementation model enables governance enforcement through info@denodo.com, or download
a single virtual view of data, combining Denodo’s single virtual data access layer; Denodo Express using the link:
on-premises and cloud data sources. the flexibility to add new data assets; http://www.denodo.com/en/
and data virtualization in the cloud denodo-platform/denodo-express
4 Enterprise-ready data governance
allows data-access management from supporting Logitech’s cloud strategy to to get started
a single point for all data sources. bring innovation and agility.
24 AUGUST/SEPTEMBER 2016 | DBTA

10 New Requirements for


Modern Data Integration
Don’t let your legacy data and application just as easily as it can consume and deliver 8) INTEGRATION IS STILL ALL
integration technology be your legacy. Here responses to discrete business events. ABOUT CONNECTIVITY
are 10 new integration requirements that 4) INTEGRATION IS EVENT-BASED By definition, integration is about
will accelerate your enterprise cloud, big RATHER THAN CLOUD DRIVEN connecting disparate systems each with
data, and IoT adoption. Responding to a business event as its own API set, and an integration
1) APPLICATION INTEGRATION it happens is expected. For example, platform needs an effective framework to
IS DONE PRIMARILY THROUGH increasing the stock inventory on an adapt these APIs to efficiently process the
REST AND SOAP SERVICES item based on sentiments expressed in data. In addition, a large set of pre-built
To be effective, modern data integration social media or entering a support case connectors speeds up the implementation
platforms must provide easy and robust automatically when a failure is detected and increases agility in responding to new
ways to consume REST and SOAP. They at a device. In either case, polling after the integration scenarios.
need to provide an easy way to abstract the fact for these conditions means a frustrated
complexities of these APIs into business or lost customer and an inefficient process 9) INTEGRATION HAS
actions and objects to enable an application in today’s real-time enterprise. TO BE ELASTIC
administrator to rapidly integrate these Reserving capacity to handle the worst
services with the rest of the enterprise. 5) INTEGRATION IS PRIMARILY case computation/storage needs is costly,
DOCUMENT-CENTRIC and not having sufficient capacity when
2) LARGE-VOLUME DATA This is a corollary to the fact that necessary is even more so. This means
INTEGRATION IS AVAILABLE integration is based on SOAP/REST APIs that the integration framework has to be
TO A HADOOP-BASED DATA that send and receive hierarchical documents able to scale up and scale down resources
LAKE OR TO CLOUD-BASED rather than row sets or compressed message on demand.
DATA WAREHOUSES payloads of the previous generation client
Enterprise IT organizations are moving server-based technologies. Transforming 10) INTEGRATION HAS
away from bespoke data warehouses hierarchical documents into row sets or TO BE SELF-SERVICE AND
to data lakes that are repositories of all into compressed payloads at the edges to DELIVERED AS A SERVICE
data based on a Hadoop cluster. Spark is make the internal engines run efficiently In a world that is increasingly cloud-
used as the compute framework for data is the biggest impediment to streamlined based and data-driven, integration
transformation of large amounts of data in repurposing of the previous generation of technology has to be delivered as a service
this environment. Cloud data warehouse data integration tooling. that’s widely accessible. A new class of
technologies such as Amazon Redshift and users has made self-service essential
Microsoft Azure SQL Data Warehouse are 6) INTEGRATION IS HYBRID and only a cloud-based approach with
alternatives to expensive specialized data In today’s hybrid, multi-cloud simplified design, management, and
warehouse appliances. Data integration environment, modern data integration monitoring interfaces can meet the broad
tooling has to have a native understanding technology has to be able to respect data’s spectrum of requirements.
of newer storage and compute frameworks gravity and handle both on-premises and
based on large-scale distributed cloud-based applications and data sources These new requirements have given
frameworks such as HDFS and Spark. with the same efficiency and ease. rise to the requirement for a converged
integration platform as a service (iPaaS),
3) INTEGRATION HAS TO 7) INTEGRATION HAS TO which should be built from the ground up
SUPPORT THE CONTINUUM BE ACCESSIBLE THROUGH to address the new and legacy enterprise
OF DATA VELOCITIES SOAP/REST APIS data and application integration needs. n
Last-generation data integration engines Integration technology has to
were either optimized for batch processing interoperate with other services in the SNAPLOGIC
of large volume data or for low latency enterprise such as monitoring, provisioning, www.snaplogic.com
handling of small messages. Modern and security. For example, enterprises might
integration platforms should be able to want to monitor the success or failure
provide the necessary velocity regardless of integration flows through their own
of size of data. This means that the engine monitoring tools, and they might want to
has to be able to stream large data such as add new users automatically as they get
sensor data from the Internet of Things added to the enterprise integration group.
AUGUST/SEPTEMBER 2016 | DBTA 25

Data Integration:
A Stepping Stone to
Modern Data Apps
Modern data integration in the enterprise once it is in the data lake? And how do you Through its open source extensions,
is about deriving meaningful insights accelerate the time to value for the business Cask Hydrator and Cask Tracker, CDAP
from a variety of data sources and data by not just solving the data integration and offers capabilities and user interfaces
structures that can be quickly turned into governance problems, but also promoting specifically designed to help with data
actionable business information. The data reuse of components to prevent repeated integration challenges, such as how to
can be structured, semi-structured and/or integration efforts? How do you get on a quickly “hydrate” a data lake, and how to
unstructured data; it can contain business, path to rich, data-driven applications— easily “track” data movement within data
technical and operational metadata; and its such as recommendation engines and lakes and data applications. Cask Hydrator
origins can be from within or from outside anomaly detection systems—which require is a self-service and extensible framework
the enterprise, including everything from application and cloud integration and often designed to develop, run, automate
ERP systems, legacy and modern databases, turn out taking a long time due to siloed and operate data pipelines. Its intuitive
to social media posts (tweets, blogs, etc.), efforts within the enterprise? Merging the drag-and-drop interface integrates with
system and application logs, machine sensor data flow with the application flow in one Hadoop and non-Hadoop storage, and it
data, etc. Data integration tools available cohesive integration approach can help has the ability to switch between different
on the market today are generally viewed reduce the amount of custom coding and processing technologies—MapReduce,
as adequate for helping with ingesting, manual processes, while speeding up the Spark, and Spark Streaming. Cask Hydrator
preparing, transforming and provisioning development and deployment of modern can prepare, blend, aggregate and apply
data for where they eventually are consumed data applications. science to create a complete picture of an
in the business or where they need to be fed This is where the Cask Data Application enterprise’s business data in order to drive
into business processes. They have come a Platform (CDAP) comes in. CDAP provides actionable insights.
long way from supporting manual coding to a unified integration platform that allows Cask Tracker is another open source,
providing more intuitive, visual interfaces, developers, data scientists and IT/operations self-service framework that automatically
and from increasingly handling real-time teams to use a consistent set of tools for captures rich metadata and provides users
data delivery in addition to bulk (batch) data with visibility into how data is flowing into,
movement. They typically now also facilitate out of, and within a data lake. It enables
the scale required with integrating data IT to oversee changes, while delivering
sources from cloud-based applications and trusted, secure data and an audit-trail
data storage. But in a world where big data for compliance in a complex data lake
and applications are converging rapidly, is the environment. Cask Tracker provides access
value traditional data integration tools are to structured information that describes,
providing to the business going far enough, explains, locates, and makes it easier to
or are they merely a stepping stone—albeit retrieve, use and manage datasets, including
an important one—to modern data-driven rich lineage and provenance information.
applications? CDAP, with its extensions Cask Hydrator
Generally speaking, a good, modern and Cask Tracker, is the de facto, open
data integration solution must deal with a source big data application and integration
wide variety of data types and sources, offer platform for building, deploying and
customizable data preparation and cleansing, Image 1: Cask Data App Platform operating data-centric applications and
and support different modes of data delivery data lakes; it also enables IT organizations
(batch, micro-batch, streaming). It should data integration, application integration, to implement well-governed data-as-a-
offer easy and timely access to data, help operations management and governance. service environments designed to quickly
with data governance, and support both As an integrated framework, it has been unlock the value of data. For more
existing and emerging use cases, such as IoT. designed from the ground up for building, information about these products, please
In the world of big data, one of the most deploying and operating self-service data go to the Cask website, and to stay updated
commonly expressed challenges is how to get applications and data lakes on Hadoop and on product and company news, follow us
data from sources into an application or a Spark. It is 100% open source and highly on Twitter @caskdata. n
data lake, where it can generate value. But the extensible, and it supports all major Hadoop
challenges don’t end there: How do you track distributions offering complete portability CASK
where the data goes, and who has access to it within and between the distros. www.cask.co