Sei sulla pagina 1di 25

INTRODUCTION TO HADOOP AND HDFS- ABHISHEK VERMA

What is Hadoop. ???


Software platform that lets one easily write and run applications that
process vast amounts of data. It includes:
MapReduce offline computing engine
HDFS Hadoop distributed file system

Scalable

Economical

Efficient

Introduction to Hadoop and HDFSABHISHEK VERMA

Reliable

Hadoop at Glance:-Master- Slave


Architecture
Data
Node

Data
Node
Master Node

Name
node

Data
Node
Introduction to Hadoop and HDFSABHISHEK VERMA

Slave Node

Data
Node

Starting with HDFS


HDFS Architecture
HDFS Internals
HDFS Interactions

Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS Architecture

Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS Cluster view

Nodes

Data Center
Switch

Rack Switches

Name Node(can
only be one per
cluster)

Data Nodes(Can be
many)
Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS Architecture

Multi Node Architecture

Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS INTERNALS

Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS Admin Interactions

Introduction to Hadoop and HDFSABHISHEK


VERMA
http://hadoop.apache.org/docs/r0.18.0/hdfs_shell.html

HDFS Client Interactions

Application
HDFS Client

(file name, block id)


(block id, block location)

HDFS namenode
/user/css534/input
File namespace
block 3df2

instructions
(block id, byte range)
block data

state

HDFS datanode
Linux local file system

Introduction to Hadoop and HDFSABHISHEK VERMA

HDFS datanode
Linux local file system

HDFS File Read and Write


File Read

File Write

1. open
HDFS
client
client JVM

3. read
6. close

Distributed
FileSystem
FSData
InputStream

2. get block locations


NameNode
name node

1. create
HDFS
client
client JVM

3. write
7. close

Distributed
FileSystem
FSData
OutputStream

2. create
NameNode
name node
8. complete

client
client node
node
4. read from the closest node

4. get a list of 3 data nodes


5. read from the 2nd closest node
6. ack packet
5. write packet

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

data node

data node

data node

data node

data node

data node

If a data node crashed, the crashed node is removed, current block receives a newer id so
as to delete the partial data Introduction
from the crashed
node later,
and Namenode allocates an
to Hadoop
and HDFSanother node.
ABHISHEK VERMA

Configuring HDFS
Three files I have to edit to configure HDFS.
1. Core-Site.xm
2. Mapred-site.xml
3. Hadoop-env.sh

Introduction to Hadoop and HDFSABHISHEK VERMA

Files to Edit

Introduction to Hadoop and HDFSABHISHEK VERMA

Pointing Towards NameNode

Introduction to Hadoop and HDFSABHISHEK VERMA

Hadoop Environment
Setup

Introduction to Hadoop and HDFSABHISHEK VERMA

How to run Hadoop


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

How to Stop Hadoop


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

Check All Services

Introduction to Hadoop and HDFSABHISHEK VERMA

IMPRACK AWARENESS

Form a lookup
file with all IP
in it

Introduction to Hadoop and HDFSABHISHEK VERMA

Checking Health and


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

Checking Health and


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

Checking Health and


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

Checking Health and


Services

Introduction to Hadoop and HDFSABHISHEK VERMA

HADOOP IN DETAIL

Introduction to Hadoop and HDFSABHISHEK VERMA

Questions?

Introduction to Hadoop and HDFSABHISHEK VERMA

Potrebbero piacerti anche