Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Hadoop is an open source framework which is used for storing and processing the large scale of data sets on large
clusters of hardware. The specialty of Hadoop involves in HDFS which is used for storing data on large commodity
machines and provides very huge bandwidth for the cluster. Mainly, Hadoop uses Map Reduce Method for
processing large scale data sets. Lead Online Training provides the best online training for Hadoop by technical
experts in subject and will provide you the best training and makes you perfect in technology. We are always
available to support you.
What is Hadoop?
Slave daemons
Replication of data
Local mode
www.kellytechno.com
Page 1
Pseudo-distributed mode
Hadoop administration:
Setup of Hadoop cluster of Cloud era, Apache, Green plum, Horton works
Data backup.
Hadoop Development :
Output collection
www.kellytechno.com
Page 2
Identification of mapper
Identification of reducer
Exploring the problems using this application
MR unit testing
Logging
Debugging strategies
Advanced MapReduce Programming
Secondary sort
Mapreduce joins
Monitoring & debugging on a Production Cluster
Counters
Skipping Bad Records
Running the local mode
Reduction network traffic by combiner
Partitioners
Reducing of input data
Using Compression
Reusing the JVM
Performance Aspects
CASE STUDIES
CDH4 Enhancements :
1. Name Node Availability
2. Name Node federation
3. Fencing
4. MapReduce 2
HADOOP ANALYST
1.Concepts of Hive
2. Hive and its architecture
3. Install and configure hive on cluster
4. Type of tables in hive
5. Functions of Hive library
6. Buckets
7. Partitions
8. Joins
1. Inner joins
www.kellytechno.com
Page 3
PIG:
1.Pig basics
2. Install and configure PIG
3. Functions of PIG Library
4. Pig Vs Hive
5. Writing of sample Pig Latin scripts
6. Modes of running
1. Grunt shell
2. Java program
7. PIG UDFs
8. Macros of Pig
9. Debugging the PIG
IMPALA:
1. Difference between Pig and Impala Hive
2. Does Impala give good performance?
3. Exclusive features
4. Impala and its Challenges
5. Use cases
NOSQL:
1. HBase
2. HBase concepts
3. HBase architecture
4. Basics of HBase
5. Server architecture
6. File storage architecture
7. Column access
8. Scans
9. HBase cases
10. Installation and configuration of HBase on a multi node
11. Create database, Develop and run sample applications
12. Access data stored in HBase using clients like Python, Java and Pearl
13. Map Reduce client
14. HBase and Hive Integration
15. HBase administration tasks
16. Defining Schema and its basic operations.
17. Cassandra Basics
www.kellytechno.com
Page 4
Ecosystem Components:
1. Sqoop
2. Configure and Install Sqoop
3. Connecting RDBMS
4. Installation of Mysql
5. Importing the data from Oracle/Mysql to hive
6. Exporting the data to Oracle/Mysql
7. Internal mechanism
Oozie:
1. Oozie and its architecture
2. XML file
3. Install and configuring Apache
4. Specifying the Work flow
5. Action nodes
6. Control nodes
7. Job coordinator
Avro, Scribe, Flume, Chukwa, Thrift
1. Concepts of Flume and Chukwa
2. Use cases of Scribe, Thrift and Avro
3. Installation and configuration of flume
4. Creation of a sample application
www.kellytechno.com
Page 5