Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Hadoop is an open-source software framework used for distributed storage and processing of dataset of
big data using the MapReduce programming model. It consists of computer clusters built from commodity
hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures
are common occurrences and should be automatically handled by the framework.
Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity
machines, providing very high aggregate bandwidth across the cluster;
Hadoop YARN – a platform responsible for managing computing resources in clusters and using them for
scheduling users' applications; and
Hadoop MapReduce – an implementation of the MapReduce programming model for largescale data
processing.
What is Hadoop…??
Features of Hadoop:
(i) Flexible
(ii) Scalable
(iii) Building efficient Data economy
(iv) Robust system
(v) Cost Effective
Hadoop vs RDBMS
HDFS Architecture……….
1) Name Node
2) Data Node
3) Secondary Name Node
4) Job Tracker
5) Task Tracker
Replication in Hadoop
Data Storage in Data Node
Replication Configuration
What is combiner..?
What is Partitioner..?
SQOOP
Introduction to SQOOP
HBase
Hands on with Examples HBase and ZooKeeper
HBase introduction
HBase architecture and ZooKeeper Service: -- Data Model, Operations, Implementation, Consistency,
Sessions
Note:
Topic wise material will be provided with scenarios.
Assignments and tasks will be given to make you hands-on.
Technical assistance will be given even after the course.