Sei sulla pagina 1di 3

1) ​Big data /Hadoop concepts: - (7 -10 hour)

● Introduction to Big data and Hadoop Ecosystem (Illustrate the problem due to
big-data)
● Need of big data ie 3 V’s
● Hadoop framework & concept
● HDFS
○ Architecture of HDFS
○ Default Configuration
○ Fault Tolerance
○ Rack Awareness
○ Read/Write in HDFS
○ HDFS Federat
○ ion
● Component of ​Hadoop Ecosystem
● Hadoop 1.0 VS Hadoop 2.0 (Architecture of both version)
● Map Reduce
○ Deamons of Map reduce
○ Architecture of Map Reduce
○ Java Example (practice)
○ Optimization technique (Combiner)
○ Job submission in cluster
● Quiz

2) Hive: - (4-6 hours)


● What is Apache Hive
● Architecture- Design & architecture of hive
▪ Describe how Apache Hive fits in the Hadoop ecosystem
● Data Type-Simple and complex Data type
● Internal & External Tables
● Concepts of HQL- HQL in hive Query Language
▪ Create databases/Create simple, external, and partitioned
tables/Alter and drop tables
▪ Query tables/Combine and store tables
● UDF’s in Hive –How to work with user define function
● Function Bolt –in-eg. MIN, MAX, TIMESTAMP
● Example: - (practice)
3) ​PIG: - (3-4 hours)
● What is Apache Pig
● Architecture – Design of PIG
▪ Describe how Apache Pig fits in the Hadoop ecosystem
▪ Difference and similarity between Pig and Hive
● Concepts of PIG Latin –how it works
● Data types-simple & complex type
● Extract, Transform, and Load Data with Apache Pig
● UDF in PIG-working with UDF defined function
● Built in function – eg: AVG,SUM etc.
● Example ( practice)

4) ​Sqoop: (3-4 hours)


● What is Apache Sqoop
● Architecture- design
▪ Describe how Apache Sqoop fits in the Hadoop ecosystem
● Concepts- how it works
▪ Advantages of Sqoop
▪ Where/When to use Sqoop
● Import & Export –Commands in Sqoop
● Incremental Logic- How to append data
● Sqoop with Hive- Interacting Sqoop with Hive
● Examples (Practice)

5) Hbase:- (3-5 hours)


● What is Apache Hbase
● Architecture- Design
▪ Describe how Apache Hbase fits in the Hadoop ecosystem
● Concepts- How it work & its components
▪ Advantages of Apache Hbase
▪ Where/When to use Apache Hbase
● Data type-simple and complex
● Table creation in Hbase
● load data from Hbase to hive
● Load data from Pig to Hbase
● Example (Practice)
6) Spark: (4-6 hours)
● What is Apache Spark
● Architecture- design
▪ Describe how Apache Spark fits in the Hadoop ecosystem
● Concepts- Working of Kafka
▪ Advantages of Apache Spark
▪ Where/When to use Apache Spark
● Batch processing, streaming & spark SQL
● Examples (Practice)
● Basic Example for Spark (code for batch in spark)

7 & 8) Kafka and Flume: (6 hours)

Apache Kafka
● What is Apache Kafka
● Architecture- design
▪ Describe how Apache Kafka fits in the Hadoop ecosystem
● Concepts- Working of Kafka
▪ Advantages of Kafka
▪ Where/When to use Kafka
● Producer, Broker & Consumer – Explaining its components

Apache FLUME

● What is Apache Flume


● Architecture - Design
▪ Describe how Apache Flume fits in the Hadoop ecosystem
▪ Difference and similarity between Flume and Kafka
● Concept – Working of flume
▪ Advantages of Apache Flume
▪ Where/When to use Apache Flume
● Flume connecting to twitter Application

Potrebbero piacerti anche