Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Scope:
Banking Domain
Problem definition:
How effectively ETL & DW can be replaced with HADOOP
Stages:
1. 2. 3. 4. 5. 6. 7. 8. Data Source Data Model ETL Conversion DW OLAP Data Model for Analysis Reporting Visualization Value
Solution
Phase I
1. Replace ETL & DW a. Collect structured data. b. Data Model for HIVE/Pig. c. Load data into HDFS via scripts.
Phase II
1. Data model for OLAP using Hive 2. Script for loading data to OLAP.
RASIC Task 1. Data Source 2. Cluster Setup 3. Data Model 4. Compare DB vs HDFS Resrouce Duration 2 days 2 days 2 days 2 days
POC 29-07-2013 1. Difference between HDFS, OLTP, HBASE, PIG, HIVE 2. Pros and Cons. 3. Log file analysis. a. Fetch source log (existing log/realtime log) b. Load to Hadoop (using Flume for realtime logs) c. Cleansing of log data in Hadoop (Ts Phase of EsTsL) d. Join data using PIG. e. Store joined data in HADOOP. Final Stage.