Sei sulla pagina 1di 24

Enterprise Data Warehouse Optimization

with Hadoop Big Data


2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
@Pentaho #BigDataWebSeries
Your Hosts Today
Dave Henry
SVP Enterprise Solutions
2
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Davy Nys
VP EMEA & APAC
3
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Source/copyright: The Human Face of Big Data
Pentaho Webinar Series
4
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Sign-up at: pentaho.com
Goals for Today
5
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
To understand:
Challenges with the
current EDW
architecture
Trends and shifts in
data processing
How Hadoop can help
How to leverage
Hadoop with Pentaho
Visual MapReduce

Complete Analytics and Visual Data Management
Hadoop NoSQL Databases
Data Discovery
&
Visualization
Enterprise
&
Ad Hoc Reporting
Predictive Analytics
&
Machine Learning
Data Ingestion, Manipulation
&
Integration
Analytic Databases
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
6
Traditional Data Warehouse Architecture
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
7
Source data
acquisition /
Ingestion

Initial
consolidation
as required
Cleansing

Transformation

Change Data
Capture

Data
Warehouse
Management

Extract
Transform
Load
Dashboard
Report
Analyzer
Structured Data
Unstructured Data
Data Mart(s)
/ Warehouse
Metadata
Trends with Data Processing
8
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Data Load
Volume of existing data sources are
steadily increasing
Requirement to make data available for
longer periods of time (3 years -> 30 years)
New sources of data are desired for
analysis machine-generated or external/
3rd-party data
Extract data from source systems
Load it (in its raw form) into the EDW
Transform it via SQL, creating new tables
Load the new tables into the official data
warehouse
ELTL
Approach
To Data Load
EDW cant handle increasing data and workloads, so
companies must:
Reduce the volume of data
Restrict end-user access (# of users or access windows) to
accommodate longer batch processing windows
Purchase additional capacity (hardware / licenses), which can be
as much as $100K / TB



Then, companies are faced with the following challenges:
The compromise itself
The incremental outlay of capital required to expand the EDW or
purchase more proprietary ETL tool capacity
The inability of the incumbent ETL vendor to work with Hadoop



Challenges with Traditional Approaches
9
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solution Architecture with Hadoop
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
10
Data
Integration

Source data
acquisition /
Ingestion

Initial
consolidation
as required


ETL
ETL
Metadata
Dashboard
Report
Analyzer
Structured Data
Unstructured
Data
Data
Integration

Cleansing
Transformation
Change Data
Capture
Data
Warehouse
Management

Data Mart(s) /
Warehouse
Core Benefits
1. Improve performance
Meet critical data processing SLAs
2. Retain all data for analysis
3. Lower costs of data
management, growth
4. Extend existing EDW
capacity
Increase ROI from current investments
11
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Costs
Time
Flexibility
Challenges with Hadoop:
Scripting and Coding
12
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Costs
Time
Flexibility
Pentaho: Quickest, Most Complete
Solution for Big Data
Design, develop and deploy 15x faster:
Full continuity from data access to decisions complete data integration &
business analytics platform for any big data store
Faster development, faster runtime visual development, distributed execution
Instant and interactive analysis no coding, no ETL required
13
2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solution Architecture & Demo
14
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solution Architecture & Demo
Data Warehouse Optimization
Data Sources
Big Data Architecture
Data Warehouse
(Master & Transactional Data)
ERP
CRM
CDR
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Analytic
Data Mart(s)
Logs
Logs
Other Data


Raw Data
Parsed Data
Analytic Datasets
Master Data
Tape
Archive
ORCHESTRATE
ERP DW
Processing
CRM
Pig, Oozie, Flume, Hive,
HBase, Sqoop


Raw Data
Parsed Data
Analytic Datasets
Pentaho for Hadoop Data Integration + Analytics
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
16
Master Data
Analysis &
Reporting
A
N
A
L
Y
Z
E
VISUAL MAP REDUCE
Data Integration Analytics
I
N
G
E
S
T
Ingestion
Structured Data
Unstructured
Data
Example Call Record Processing
What are the top 10 states for outbound
calls on Fridays, Saturdays and Sundays?
Data available:
Call records: date/timestamp & source phone #
Reference data: area code by country, state &
time zone (North American Numbering Plan)
Goal:
Parse, enrich and filter the data
Load the data into Postgres for analysis
Challenge
Prepare the data without impacting the EDW (no
ELT)
!
Raw Data
Hadoop Data Processing Scenario
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
18
Master Data
Ingestion
Structured Data
Unstructured
Data
I
N
G
E
S
T
Processing


Raw Data
Parsed Data
Analytic Datasets
Visual MapReduce
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
19
Master Data
VISUAL MAP REDUCE
1. MapReduce Input calling data
2. Calculate Month, Day, Day of Week
3. Extract 3 digit area code
4. Lookup geo master data in HDFS
5. Filter for weekend and US only calls
6. Create Value field for Key-Value Pair
7. Create Key field for Key-Value Pair
8. MapReduce Output Key-Value Pair
Java
Programing
Solution Architecture & Demo
20
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
End of Demo
Leveraging Hadoop with Pentaho
21
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
OEM
Flexibility, Extensibility, Architected to Embed

Pricing
One of top reasons customers choose us

Community/Open Source Cache
Similar to Hadoop
Data Management Platform
Visual Map Reduce, Orchestration,
Connectivity
Fusion of all data sources & processing
Control/Manage/Optimize flow of data

Hybrid
Leverages non-Hadoop infrastructure
Overall Benefits
22
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Business
Benefits
You can defer upgrades to expensive EDW
hardware
You can offload batch processing from the EDW and
make it more available to end-users (improve
performance / comply with SLAs)
With better performance you may need smaller
cluster sizes
This is a low-risk use case that lets you get familiar
with Hadoop while creating business value
Its easy to evaluate you dont need to modify
your cluster and risk disrupting the configuration
Technical
Benefits
You should keep your EDW, but use Hadoop
and Pentaho to optimize data processing
Solution Architecture & Demo
23
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Q & A
24
2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Contact Us or Sign-up at:
pentaho.com

Potrebbero piacerti anche