2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions 2 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Davy Nys VP EMEA & APAC 3 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Source/copyright: The Human Face of Big Data Pentaho Webinar Series 4 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Sign-up at: pentaho.com Goals for Today 5 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 To understand: Challenges with the current EDW architecture Trends and shifts in data processing How Hadoop can help How to leverage Hadoop with Pentaho Visual MapReduce
Complete Analytics and Visual Data Management Hadoop NoSQL Databases Data Discovery & Visualization Enterprise & Ad Hoc Reporting Predictive Analytics & Machine Learning Data Ingestion, Manipulation & Integration Analytic Databases 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 6 Traditional Data Warehouse Architecture 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 7 Source data acquisition / Ingestion
Initial consolidation as required Cleansing
Transformation
Change Data Capture
Data Warehouse Management
Extract Transform Load Dashboard Report Analyzer Structured Data Unstructured Data Data Mart(s) / Warehouse Metadata Trends with Data Processing 8 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Data Load Volume of existing data sources are steadily increasing Requirement to make data available for longer periods of time (3 years -> 30 years) New sources of data are desired for analysis machine-generated or external/ 3rd-party data Extract data from source systems Load it (in its raw form) into the EDW Transform it via SQL, creating new tables Load the new tables into the official data warehouse ELTL Approach To Data Load EDW cant handle increasing data and workloads, so companies must: Reduce the volume of data Restrict end-user access (# of users or access windows) to accommodate longer batch processing windows Purchase additional capacity (hardware / licenses), which can be as much as $100K / TB
Then, companies are faced with the following challenges: The compromise itself The incremental outlay of capital required to expand the EDW or purchase more proprietary ETL tool capacity The inability of the incumbent ETL vendor to work with Hadoop
Challenges with Traditional Approaches 9 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Solution Architecture with Hadoop 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 10 Data Integration
Source data acquisition / Ingestion
Initial consolidation as required
ETL ETL Metadata Dashboard Report Analyzer Structured Data Unstructured Data Data Integration
Cleansing Transformation Change Data Capture Data Warehouse Management
Data Mart(s) / Warehouse Core Benefits 1. Improve performance Meet critical data processing SLAs 2. Retain all data for analysis 3. Lower costs of data management, growth 4. Extend existing EDW capacity Increase ROI from current investments 11 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Costs Time Flexibility Challenges with Hadoop: Scripting and Coding 12 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Costs Time Flexibility Pentaho: Quickest, Most Complete Solution for Big Data Design, develop and deploy 15x faster: Full continuity from data access to decisions complete data integration & business analytics platform for any big data store Faster development, faster runtime visual development, distributed execution Instant and interactive analysis no coding, no ETL required 13 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Solution Architecture & Demo 14 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Solution Architecture & Demo Data Warehouse Optimization Data Sources Big Data Architecture Data Warehouse (Master & Transactional Data) ERP CRM CDR Analytic Data Mart(s) Analytic Data Mart(s) Analytic Data Mart(s) Logs Logs Other Data
Raw Data Parsed Data Analytic Datasets Master Data Tape Archive ORCHESTRATE ERP DW Processing CRM Pig, Oozie, Flume, Hive, HBase, Sqoop
Raw Data Parsed Data Analytic Datasets Pentaho for Hadoop Data Integration + Analytics 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 16 Master Data Analysis & Reporting A N A L Y Z E VISUAL MAP REDUCE Data Integration Analytics I N G E S T Ingestion Structured Data Unstructured Data Example Call Record Processing What are the top 10 states for outbound calls on Fridays, Saturdays and Sundays? Data available: Call records: date/timestamp & source phone # Reference data: area code by country, state & time zone (North American Numbering Plan) Goal: Parse, enrich and filter the data Load the data into Postgres for analysis Challenge Prepare the data without impacting the EDW (no ELT) ! Raw Data Hadoop Data Processing Scenario 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 18 Master Data Ingestion Structured Data Unstructured Data I N G E S T Processing
Raw Data Parsed Data Analytic Datasets Visual MapReduce 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 19 Master Data VISUAL MAP REDUCE 1. MapReduce Input calling data 2. Calculate Month, Day, Day of Week 3. Extract 3 digit area code 4. Lookup geo master data in HDFS 5. Filter for weekend and US only calls 6. Create Value field for Key-Value Pair 7. Create Key field for Key-Value Pair 8. MapReduce Output Key-Value Pair Java Programing Solution Architecture & Demo 20 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 End of Demo Leveraging Hadoop with Pentaho 21 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 OEM Flexibility, Extensibility, Architected to Embed
Pricing One of top reasons customers choose us
Community/Open Source Cache Similar to Hadoop Data Management Platform Visual Map Reduce, Orchestration, Connectivity Fusion of all data sources & processing Control/Manage/Optimize flow of data
Hybrid Leverages non-Hadoop infrastructure Overall Benefits 22 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Business Benefits You can defer upgrades to expensive EDW hardware You can offload batch processing from the EDW and make it more available to end-users (improve performance / comply with SLAs) With better performance you may need smaller cluster sizes This is a low-risk use case that lets you get familiar with Hadoop while creating business value Its easy to evaluate you dont need to modify your cluster and risk disrupting the configuration Technical Benefits You should keep your EDW, but use Hadoop and Pentaho to optimize data processing Solution Architecture & Demo 23 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Q & A 24 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Contact Us or Sign-up at: pentaho.com