Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
– Map/Reduce :
distributed data processing (processing
engine)
HDFS
(Hadoop Distributed File System)
HDFS stores files in blocks typically 64 MB in size
HDFS is optimized for workloads that are generally of the write-once and
read-many type
Snapshot of NameNode
Minimizes downtime/data loss if NameNode
fails
JobTracker
Word Count
Word Count
Pseudo-code
map(offset,line-contents) :
for each word in line-contents :
emit(word,1)
reduce(word,values) :
sum=0
for each value in values :
sum = sum + value
emit(word,sum)
Hadoop
WordCountMapper
public class WordCountMapper extends Mapper<LongWritable,
Text, Text, IntWritable>{
private final static
IntWritable one = new IntWritable(1);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileOutputFormat.setOutputPath(job, new
Path("/output"));
job.setOutputFormatClass(TextOutputFormat.class);
MapReduce Example: Word Count
Mapper
<Solusi247,1><berdiri,1><tahun,1><2000,1><yang,1><bisnis,1><utamanya,1><bidang,1><telco,1>
<Sejak,1><tahun,1><2006,1><Solusi247,1><mengembagkan,1><riset,1><bidang,1><Radar,1><dan,1>
<Big,1><Data,1>
<Product,1><Solusi247,1><untuk,1><Big,1><Data,1><antara,1><lain,1><HGrid247,1><HSpark247,1>
<dan,1><SGrid247,1>
MapReduce Example: Word Count
<2000,[1]> <2000,1>
<2006,[1]> <2006,1>
<Big,[1,1]> <Big,2>
<Data,[1,1]> <Data,2>
<HGrid247,[1]> <HGrid247,1>
<HSpark247,[1]> <HSpark247,1>
<Product,[1]> <Product,1>
<Radar,[1]> <Radar,1>
<SGrid247,[1]> <SGrid247,1>
<Sejak,[1]> <Sejak,1>
<Solusi247,[1,1,1]> <Solusi247,3>
<antara,[1]> Reducer <antara,1>
<berdiri,[1]> <berdiri,1>
<bidang,[1,1]> <bidang,2>
<bisnis,[1]> <bisnis,1>
<dan,[1,1]> <dan,2>
<lain,[1]> <lain,1>
<mengembagkan,[1]> <mengembagkan,1>
<riset,[1]> <riset,1>
<tahun,[1,1]> <tahun,2>
<telco,[1]> <telco,1>
<untuk,[1]> <untuk,1>
<utamanya,[1]> <utamanya,1>
<yang,[1]> <yang,1>
Word Count
MapReduce process
Why Hadoop?