Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
FROMDEV
24
Hadoop Interview Questions & Answers for MapReduce developers
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
AgoodunderstandingofHadoopArchitectureisrequiredtoleveragethepowerofHadoop.Belowarefewimportant
practicalquestionswhichcanbeaskedtoaSeniorExperiencedHadoopDeveloperinaninterview.Ilearnedthe
answerstothemduringmyCCHD(ClouderaCertifiedHaddopDeveloper)certification.Ihopeyouwillfindthem
useful.ThislistprimarilyincludesquestionsrelatedtoHadoopArchitecture,MapReduce,HadoopAPIandHadoop
DistributedFileSystem(HDFS).
Hadoopisthemostpopularplatformforbigdataanalysis.TheHadoopecosystemishugeandinvolvesmany
supportingframeworksandtoolstoeffectivelyrunandmanageit.ThisarticlefocusesonthecoreofHadoopconceptsanditstechniquetohandle
enormousdata.
Belowlistofhadoopinterviewquestionsandanswersthatmayproveusefulforbeginnersandexpertsalike.Thesearecommonsetofquestions
thatyoumayfaceatbigdatajobintervieworahadoopcertificationexam(likeCCHD).
What is a JobTracker in Hadoop? How many instances of JobTracker run on a Hadoop Cluster?
JobTrackeristhedaemonserviceforsubmittingandtrackingMapReducejobsinHadoop.ThereisonlyOneJobTrackerprocessrunonany
hadoopcluster.JobTrackerrunsonitsownJVMprocess.Inatypicalproductionclusteritsrunonaseparatemachine.Eachslavenodeis
configuredwithjobtrackernodelocation.TheJobTrackerissinglepointoffailurefortheHadoopMapReduceservice.Ifitgoesdown,allrunning
jobsarehalted.JobTrackerinHadoopperformsfollowingactions(fromHadoopWiki:)
ClientapplicationssubmitjobstotheJobtracker.
TheJobTrackertalkstotheNameNodetodeterminethelocationofthedata
TheJobTrackerlocatesTaskTrackernodeswithavailableslotsatornearthedata
TheJobTrackersubmitstheworktothechosenTaskTrackernodes.
TheTaskTrackernodesaremonitored.Iftheydonotsubmitheartbeatsignalsoftenenough,theyaredeemedtohavefailedandthe
workisscheduledonadifferentTaskTracker.
ATaskTrackerwillnotifytheJobTrackerwhenataskfails.TheJobTrackerdecideswhattodothen:itmayresubmitthejobelsewhere,
itmaymarkthatspecificrecordassomethingtoavoid,anditmaymayevenblacklisttheTaskTrackerasunreliable.
Whentheworkiscompleted,theJobTrackerupdatesitsstatus.
ClientapplicationscanpolltheJobTrackerforinformation.
What is a Task Tracker in Hadoop? How many instances of TaskTracker run on a Hadoop Cluster
ATaskTrackerisaslavenodedaemonintheclusterthatacceptstasks(Map,ReduceandShuffleoperations)fromaJobTracker.Thereisonly
OneTaskTrackerprocessrunonanyhadoopslavenode.TaskTrackerrunsonitsownJVMprocess.EveryTaskTrackerisconfiguredwithaset
ofslots,theseindicatethenumberoftasksthatitcanaccept.TheTaskTrackerstartsaseparateJVMprocessestodotheactualwork(calledas
TaskInstance)thisistoensurethatprocessfailuredoesnottakedownthetasktracker.TheTaskTrackermonitorsthesetaskinstances,capturing
theoutputandexitcodes.WhentheTaskinstancesfinish,successfullyornot,thetasktrackernotifiestheJobTracker.TheTaskTrackersalso
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
1/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
sendoutheartbeatmessagestotheJobTracker,usuallyeveryfewminutes,toreassuretheJobTrackerthatitisstillalive.Thesemessagealso
informtheJobTrackerofthenumberofavailableslots,sotheJobTrackercanstayuptodatewithwhereintheclusterworkcanbedelegated.
What is configuration of a typical slave node on Hadoop cluster? How many JVMs run on a slave node?
SingleinstanceofaTaskTrackerisrunoneachSlavenode.TasktrackerisrunasaseparateJVMprocess.
SingleinstanceofaDataNodedaemonisrunoneachSlavenode.DataNodedaemonisrunasaseparateJVMprocess.
OneorMultipleinstancesofTaskInstanceisrunoneachslavenode.EachtaskinstanceisrunasaseparateJVMprocess.The
numberofTaskinstancescanbecontrolledbyconfiguration.Typicallyahighendmachineisconfiguredtorunmoretaskinstances.
Does MapReduce programming model provide a way for reducers to communicate with each other? In a MapReduce job can a reducer
communicate with another reducer?
Nope,MapReduceprogrammingmodeldoesnotallowreducerstocommunicatewitheachother.Reducersruninisolation.
2/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
completes.
What is the Hadoop MapReduce API contract for a key and value Class?
TheKeymustimplementtheorg.apache.hadoop.io.WritableComparableinterface.
Thevaluemustimplementtheorg.apache.hadoop.io.Writableinterface.
If reducers do not start before all mappers finish then why does the progress on MapReduce job shows something like Map(50%)
Reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
Reducersstartcopyingintermediatekeyvaluepairsfromthemappersassoonastheyareavailable.Theprogresscalculationalsotakesin
accounttheprocessingofdatatransferwhichisdonebyreduceprocess,thereforethereduceprogressstartsshowingupassoonasany
intermediatekeyvaluepairforamapperisavailabletobetransferredtoreducer.Thoughthereducerprogressisupdatedstilltheprogrammer
definedreducemethodiscalledonlyafterallthemappershavefinished.
3/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
HDFSisdesignedtosupportverylargefiles.ApplicationsthatarecompatiblewithHDFSarethosethatdealwithlargedatasets.
Theseapplicationswritetheirdataonlyoncebuttheyreaditoneormoretimesandrequirethesereadstobesatisfiedatstreaming
speeds.HDFSsupportswriteoncereadmanysemanticsonfiles.
What is HDFS Block size? How is it different from traditional file system block size?
InHDFSdataissplitintoblocksanddistributedacrossmultiplenodesinthecluster.Eachblockistypically64Mbor128Mbinsize.Eachblockis
replicatedmultipletimes.Defaultistoreplicateeachblockthreetimes.Replicasarestoredondifferentnodes.HDFSutilizesthelocalfilesystem
tostoreeachHDFSblockasaseparatefile.HDFSBlocksizecannotbecomparedwiththetraditionalfilesystemblocksize.
Canyouthinkofaquestionswhichisnotpartofthispost?Pleasedon'tforgettoshareitwithmeincommentssection&Iwilltrytoincludeitinthe
list.
Postedby SachinFromDev
POST A COMMENT
DEFAULTCOMMENTS
FACEBOOKCOMMENTS
220 comments
Add a comment
Top comments
Some
of the useful questions I documented for screening of the hadoop developers. I hope you will
find these useful.
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
4/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
In general, My opinion is - its more important to ask fundamental questions about the hadoop
ecosystem and the distributed computing approach instead of jumping onto the Map reduce
problems.
+6
1
7
View all 4 replies
Bhaskar Karambelkar 2 years ago
Hey Sachin, We're currently upgrading from CDH 4.2.1 to CDH4.3.0.
Cloudera added HA with CDH4, release. And IFAIK MapR as well as Hortonworks also support HA.
Sachin FromDev 2 years ago
Wow thats gud to know. Time for me to refresh a bit. Thx for the update.
This
is one of the most detailed interview questions list on hadoop ecosystem. I created this after my
Cloudera Certified Hadoop Developer (CCHD) Certification. Hope you find it useful. Let me know if
more questions and answers need to be added to this list.
+1
2
1 Reply
1 Reply
Hi,
good content to viewers hadoop experts provides best online training on
<a href="http://mjtrainings.com/hadoop-online-training">hadoop online training</a>
by real time experienced experts
+1
2
1 Reply
1 Reply
1 Reply
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
5/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
+1
2
1 Reply
Firstly,
I want to thank you for providing clear, useful and detail information for the hadoop
beginnners. It looks like all the questions are related to Hadoop 1.x, would be good if Hadoop 2.x
related questions are also added.
1 Reply
Hi
How can I change the slot number in hadoop ?How should i changemapred-site.xml in hadoop to
change the slot number according to my formula?could you please tell me the code?!
1 Reply
Nice
questions..definately of great help. Thanks a lot!
http://www.bestandroidtrainingchennai.in/
1
Really
is very interesting, I saw your website and get more details..Nice work. Thanks regards,
Refer this link below
http://www.sastraininginchennai.in
1 Reply
Question
:
How the Namenode knows the location of the datanode, that it is connected in the local rack or
remote rack. Or how namenode find the location of the rack and node.?
+3
4
1 Reply
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
6/7
26/08/2015
24HadoopInterviewQuestions&AnswersforMapReducedevelopers|FromDev
http://www.cocolocopartyband.com/webi.html will cherish
http://www.theaviationzone.com/footi.html wasn't great Quite a while ago, also it definitely isn't
1 Reply
Thanks
to Share the LoadRunner Material for Freshers,Link as,
http://www.loadrunnertraining.in
1 Reply
Wonderful
explanation on HDFS architecture.
Thank you for sharing..
1 Reply
Show more
...
http://www.fromdev.com/2010/12/interviewquestionshadoopmapreduce.html
7/7