Sei sulla pagina 1di 10

Production Use Cases of Managed Hadoop Data Lakes

October 27, 2015


Greg Pavlik | Senior Solutions Architect
gpavlik@zaloni.com

Hadoop: Unleashing big datas potential across industries


Accelerating time to market
Identifying new products and markets
Enabling advanced analytics for improved decision-making
Healthcare

Telco

Retail

Financial Services

e.g. Patient 360o


view, Fraud

e.g. Network
analysis

e.g. Improved
decision-making,
Customer 360o view

e.g. Customer 360o


view, Fraud

Zaloni Confidential and Proprietary - Provided under NDA

The promise of a Hadoop Data Lake: All data is welcome.


Dont limit the data from which you can derive value.
Store all types of structured and unstructured data.
Dont worry about having all the answers today.
Store complete raw data - you can go back as your understanding crystalizes.
Dont limit how you can query the data.
Use various tools to get the insights on the data.
Dont build walls.
Provide democratized access via a single unified view across the Enterprise.

Zaloni Confidential and Proprietary - Provided under NDA

Tackling data lake complications


Building Hadoop
data lake:

Managing Hadoop
data lake:

Deriving value from


data lake:

Rate of Change:

Ingestion:

Quality Issues:

Skills Gap:

Lack of Visibility:

Reliance on IT:

Complexity:

Privacy and Compliance:

Reusability:

Keeping up with constantly


evolving Hadoop ecosystem
Lack of expertise in both
development and architecture
Numerous components to
integrate: Hardware, software,
applications

Difficulty getting data into the


data lake effectively
Lack of data visibility and
transparency
Addressing data privacy and
compliance issues

Need for improved data quality


control
Business users must rely on IT to
prepare data for analysis
Lack of automation means
constantly re-creating the wheel

Requirements for a Managed Data Lake


Unified Data Management: Integrated solution to manage
the entire data pipeline (instead of point products)
Managed Ingestion: Simplified onboarding of new data
sets, managed so that IT knows where data comes from
and where it lands
Integrated
Data Lake
Management

Data Reliability: Confidence that your analytics are always


running on the right data, with the right quality.
Data Visibility: Metadata management capabilities allow
you to keep track of what data is in Hadoop, its source, its
format and its lineage.
Data Security: Ensuring access control, provides data
masking (e.g. PII)
Self Service: End user accessibility to leverage the data
in the Data Lake

Data Lake Reference Architecture


Hadoop Data Lake
Source
Systems

Transient
Loading
Zone

Raw Data

Integrate to
common format

Refined
Data

Data Validation
Data Cleansing
Aggregations

Consumption
Zone

OLTP or ODS
File Data

Enterprise Data
Warehouse

DB Data

Trusted
Data

Original
unaltered data
attributes

Reference Data

Data Wrangling
Data Discovery
Exploratory Analytics

Discovery
Sandbox

ETL Extracts

Master Data

Tokenized Data

Logs

(or other unstrctured data)

Streaming

Cloud Services

{}

APIs

Metadata

Data
Quality

Data
Catalog

Security

Business Analysts
Researchers
Data Scientists

Data lake reference architecture


Hadoop Data Lake
Source
Systems

Transient
Loading
Zone

Raw Data

Integrate to
common format
Dta Validation
Data Cleansing
Aggregations

Refined
Data

Consumption
Zone

OLTP or ODS
File Date

Enterprise Data
Warehouse

Trusted
Data

Original
unaltered data
attributes

Reference Data

Master Data

DB Date
Data Wrangling
Data Discovery
Exploratory Analytics

Discovery
Sandbox

ETL Extracts
Tokenized Data

Logs
(or other unstrctured date)

Streaming
Metadata
Cloud Services

APIs

Data
Quality

Data Catalog

Security

Business Analysts
Researchers
Data Scientists

Network Data Lake Reference Architecture


Custom Applications

Exploration and Ad-hoc Analytics

BI Tools

Subscriber Usage
Customer Churn
Capacity Planning
Customer 360

BI Tools

Custom Apps

Network Data Lake


HDFS

Bedrock Network Data Models

Data Warehouse

Landing Zone + Bedrock Network Data Collectors

Unstructured Data
CRM
Billing

IPFIX
SNMP
RADIUS

CDR

DPI

Healthcare Data Lake Reference Architecture


Data Sources
Relational
Streaming
File

Edge Node

Data Lake

OGG Adapter

Consumers
Analytical
Applications

Data

Enterprise Data
Warehouse

Stream Adapters

Hadoop Cluster

Export

BDCA

Data Sets

Apps/ Analytics Tools


Bedrock Application
Manager

Transformations

Claims

EMR
Bedrock Applications Manager

Lab/Pathology
Pharmacy
Member
Social
Enterprise Data

Configure Ingestion
Operations and
Metadata Store

Administer Metadata
Data Quality &
Rules Engine

Manage, Monitor, Schedule

Query Builder

Work flow
Executor

HEDIS Reporting
Bundle Payments
Readmission Risk
Medical Benefits
Management
Scorecards
Enterprise Reports

Visit zaloni.com or
Contact us at info@zaloni.com

Potrebbero piacerti anche