Sei sulla pagina 1di 8

HADOOP Training in Hyderabad

Hadoops MapReduce features :


MapReduce can be considered as the soul of

Hadoop . MapReduce is a king of programming


parameter which permits huge amount of gullibility
across thousands of servers in a Hadoop cluster.
Users familiar to clustered processing solutions can
easily understand the MapReduce concept .
New users would find it bit difficult to grasp as it is
bit difficult and complicated to understand the
clustered structure. People new to this can go
throught the below information to undersand
MapReduce.

Map Reduce can be categorized into


two jobs executed by Hadoop
Programs:
Is the is the map job: Individual or
single elements are spitted into key
pairs or value pairs. It takes a group of
data and transforms it into another set
of data .
Is the reduce job : The output from the
map is considered as input it combines
those data of map job into a small
group of tuples. Map Reduce- as the
name indicates first always the map
job is performed and then the reduce

MapReduce is a programming model and an


associated implementation for processing and
designing a huge data sets with a distributed
algorithm on a cluster.
A MapReduce program comprises of Map method
which executed the filtering function and sorting out
of the data and a Reduce method that executes a
summary function . The "MapReduce System" or a
framework executes the processing by collecting the
distributed servers, running the varied tasks in
parallel, guiding all communications and data
transfers between the various parts of the system,
and facilitating for redundancy and fault tolerance.
The model is inspired by the map and reduce
functions commonly used in functional programming
, though their intention in the MapReduce
framework is not the same as in their original forms.

Scalability and tolerance of faults are thee main


features of MapReduce program and not only the
map and reduce functions , but the scalability and
fault-tolerance gained for a variety of applications
by optimizing the execution engine once. The
actual use of MapReduce program can be put into
practice only when the above two parameters
come into action.
For better MapReduce algorithm , it is essential to
optimize the communication cost. MapReduce
libraries are designed in many programming
languages, with different levels of optimization.
The most commonly used open source is
Apache Hadoop.
MapReduce permits for distributed processing of the
map and reduction operations. Condition is that each
mapping operation is not dependent on each other, all

parallel.While this process can often appear


inefficient compared to algorithms that are
more sequential (because multiple rather
than one instance of the reduction process
must be run), MapReduce can be used
specifically for larger datasets than
"commodity" servers can handle a huge
data server farm can utilize MapReduce to
sort a petabyte of data in only in a fraction
of time. The parallelism also provides some
possibility of recovering from partial failure
of servers or storage during the operation: if
one mapper or reducer fails, the work can
be rescheduled assuming the input data is

Uses :
It has distributed pattern based
search feature .
Reversal of web link graph ,
decomposition of Singular value .
Distributed sorting.
Clustering of documents and
translating the statistical translation
of machine .

THANK
YOU
Hadoop training in Hyderabad
www.rstrainings.com
contact@rstrainings.com
9052699906

Potrebbero piacerti anche