Sei sulla pagina 1di 15

VIRTUALISATION OF

HADOOP CLUSTERS
Dr G Sudha Sadasivam
Assistant Professor
Department of CSE
PSGCT

Introduction
Physical machine can have a number of smaller
virtual machines (VMs), each running a separate
operating system instance.
Challenges

partitioning of a machine
concurrent execution of multiple operating systems
Isolation of virtual machines from one another
Support heterogeneity of applications
Low performance overhead

Xen is a virtual machine monitor for x86 that


supports execution of multiple guest operating
systems hypervisor, kernel and user space
applications

Objective
Automation of creation and deletion of a virtual
cluster for hosting Hadoop using Xen
A large physical cluster can be simulated on few
physical machines

Steps
Input user configuration by editing configuration files.
Generates user specified number of VM running
Hadoop.
Users can manage the Hadoop file system
Users can submit jobs for each physical machine.

Need for virtualisation


Ability to recover from software problems quickly by
saving a copy of guest image.
High availability by relocating guests when a server
machine in inoperable.
Dynamic load balancing by migrating guests from server
machines.
Consolidation of many services in one physical machine
and administer them independently in VM.
Usage of abundant computational power on the physical
machine. Minimisation of cost.
Switch between applications on different OS using
hypervisors.

HADOOP CLUSTER CONFIGURATION


Host node is configured as master (NN) and also acts as slave
(DN)
Guest node (DN) is configured as slave

Master is the HostOS which acts as job tracker/Name node.


Slave is the GuestOS which acts as task tracker/Data node.

Steps in implementing

Installation of Xen kernel


Creation of Guest OS
Configuration of Guest OS
Installation of Java Development Kit
Extraction and Configuration of Hadoop
Cluster
Creating OS image for new Guest Machines
Creation and removal of other Virtual
machines, copy the OS images

Automated Creation of a Hadoop Virtual cluster

XML file has configuration details of new VM

Automated Shut down of Hadoop Virtual cluster

Advantages of automated virtualization


in Hadoop
1. Effective isolation of the datanode from the load
on the machine caused by other processes makes
the datanode more responsive/reliable.
2. The availability of multiple virtual machines on
each machine lowers the granularity of scheduling
units thus making it possible to schedule multiple
task trackers on the same machine and to improve
the overall utilization of the whole clusters.
3. The snapshot a virtual cluster makes it possible to
re-activate the same cluster in the future and start
to work from the snapshot. (rollback)

Enhancements
1. Providing a graphical console for monitoring and
managing virtual cluster.
2. Creation and Migration of virtual machine for the
purpose of load balancing.
3. Enabling snapshot of the virtual machine. For
checkpointing
4. Providing Intelligent Monitoring System which
could detect the failure of a virtual machine in the
cluster and restarts the particular virtual machine
increasing the reliability.

Performance of Physical vs Virtual clusters

Master as a Physical Node

7 Nodes
Data nodes
6 Virtual nodes
Name node
1 physical node

Master as a Virtual Node

7 Nodes
Data nodes
1 physical node +
5 Virtual nodes
Name node
1 virtual node

Performance with varying number of Virtual nodes

Potrebbero piacerti anche