Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Clouderas Hadoop developer certification is the most popular certification in the Big Data
and Hadoop community. As Ive recently cleared the CCD-410 exam, I want to take the
opportunity to provide few points that helped me with preparing for the exam and
more importantly learning Hadoop in a practical way.
Here are these:
1. Tom Whites Hadoop: The Definitive Guide book is an invaluable companion for you to
clear the exam. This may be the only book you need as this will help you to address
almost all conceptual questions in the exam. Be sure to grab the 3rd edition (latest till
date) that covers YARN.
2. Dont overlook the other Apache projects in Hadoops ecosystem like Hive, Pig, Oozie,
Flume, and HBase. There will be questions testing your basic understanding of those
topics. Refer to the related chapters in the Tom Whites book. Also, there are always
very good YouTube videos and tutorials available on the web.
3. Understand how to use Sqoop. The best way to start may be to create a simple table
in MySQL (or any database you choose) and import the data into HDFS as well as in
Hive. Understand the different features of the Sqoop tool. Again, Tom Whites book
can be used as well as the Apache Sqoop user guide.
4. Understand Hadoop fs shell commands to manipulate the files in HDFS.
5. To clear the exam you need to be hands-on in the basics of MapReduce programming,
period. You will find a lot of questions in the CCD-410 exam asking about the
outcome/possible result set based on a given MapReduce code snippet. You need to
know and practice is how to convert the common SQL data access patterns into
MapReduce paradigm. Also, there will be questions to test your familiarity on key
classes used in the driver class and the methods used (for example: Job class and
how it is used to submit a Hadoop job)
Tip: Create two simple text files with few records similar to standard emp and dept tables.
Load the files into HDFS. Then develop and test your MapReduce programs to produce
outputs similar to the following queries:
Select empl_name, dept_no, salary from emp order by dept asc, salary desc
Select dept_no,count(*) from emp group by dept_name having count(*) > 1 order by
dept_name desc;
6. You are expected to understand basic Java programming concepts. This is no sweat for
the persons regularly working in the Java environment, but for the rest of us a basic Java
refresher course will be very handy. Pay particular attention to the following topics that will
be very helpful in writing and understanding MapReduce codes.
Regular Expression
Arrays processing
Collection Framework
7. Finally, dont forget to refer to the Cloudera website for the latest updates, study guides
and sample questions for the specific certification you are targeting.
Note that you can optionally buy a practice test from Cloudera website. If you have a good
preparation and want to self-check your exam readiness you may try this out (Disclaimer: I
did it).
I also recommend that you to go through the following article from Mark Gschwinds BI Blog.
The article gives you a solid direction to jumpstart your preparation as well as learning
Hadoop.
All the best in your journey to learn Hadoop and get certified! Please share your experience
and comments.