Hadoop Administrator Training - Lab Hand Book

Hadoop Administrator Training Lab Hand Book
List of Lab Exercises

Lab VM Details
Lab 1 : Hadoop configuration
Lab 2 : HDFS Lab
Lab 3 : Map Reduce Word count
Lab 4 : Map Reduce Lab Total Transactions By Each Product
Lab 5 : HDFS Monitoring
Lab 6 : Job Tracker Monitoring
Lab 7 : Sqoop Lab : Export and Import of data
Lab 8 : Hive configuration
Lab 9 : Hive Programming
Lab 10: Pig Configuration
Lab 11 : Pig Programming
Lab VM Details
A. Linux Version Ubuntu 12.04
B. User name : user Password: password
C. super user password: password
D. Create these useful directories in the VM
The following sub-directories should be created under /home/user

Directory Name Description
Downloads Contains all Installable for Hadoop, Hive and Pig which we will download.
Lab For all lab activities

Lab/hdfs For configuring hdfs related contents
Lab/mapred For configuring mapred related contents
Lab/software Folder for installing Hadoop, Hive, Pig and Sqoop
Lab/data Input files for Lab Exercises
Lab/programs For all Map Reduce Programs
II. Lab 1 : Hadoop Configuration

All directory paths are under home directory /home/user
A. Please check if ssh is configured or not by typing in ssh localhost. If it fails then we
should set up ssh by typing the following in the Terminal window.
sudo apt-get install openssh-server
B. We need to download the following tar ball installation from the relevant sites.
a) hadoop
b) pig
c) hive
d) sqoop
e) hbase
C. Untar Hadoop jar file
o Go to lab/software in the Terminal window of the VM.
o Untar Hadoop files into software folder
tar -xvf ../../Downloads/hadoop-1.0.3.tar.gz [space after tar]
o Browse through the directories and check which subdirectory contains what files
D. Create a new file called .bash_profile [ yes . before the word] in the /home/user
directory.
E. Install OpenJDK in Ubuntu by entering the below command in the Terminal window
sudo apt-get install openjdk-6-jdk
F. To enable editing the file in a text, download winscp [FTP tool for editing and moving
files in the VM]. Type ifconfig in the terminal window to get the IP Address of the VM
and type it in the host file name file along with the username and password and
connect.
Enter the following settings
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
export HADOOP_INSTALL=/home/user/lab/software/hadoop-1.0.3
export PATH=$PATH:$HADOOP_INSTALL/bin
Save and exit .bash_profile.
run following command

. .bash_profile
Verify whether variable are defined or not by typing export at command prompt
or env
Check the following versions
java version
hadoop version
b. Create the directories

Create the following directories under lab/hdfs
mkdir namenodep
mkdir datan1
mkdir checkp
Change permission for the following directories under lab/hdfs
chmod 755 datan1
Create the following directories under lab/mapred
mkdir local1
mkdir system
c. Configuring pseudo-distributed mode

Go to conf directory under HADOOP_HOME
HADOOP_HOME is (/home/user/lab/software/hadoop-1.0.3)
Modify core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
<final>true</final>
</property>
</configuration>
Modify hdfs-site.xml

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/user/lab/hdfs/namenodep</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/user/lab/hdfs/datan1</value>
<final>true</final>
</property>
<property>
<name>dfs.checkpoint.dir</name>
<value>/home/user/lab/hdfs/checkp</value>
<final>true</final>
</property>
</configuration>
Modify mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:8021</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/user/lab/mapred/local1</value>
<final>true</final>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/user/lab/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>3</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
<final>true</final>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx400m</value>

</property>
</configuration>
d. Format the namenode

Enter the following command at prompt
hadoop namenode format [Note: Ensure that you give a Y and not a y]
Go to namenodep directory and check if any folders have been created verify
which all files have been created
[The exact folders / files is not required as shown in the screen shot. They will
be created once the cluster is started up and not at the beginning]
e. Start HDFS services
Go to conf directory under HADOOP_HOME
Edit Hadoop-env.sh and set JAVA_HOME

Export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
Go to bin directory under HADOOP_HOME and type the following command

./start-dfs.sh
Run jps and verify the following process running
f. Start Map Reduce services

Go to bin directory under HADOOP_HOME and type the following command
./start mapred.sh
Run jps and verify the following processes running.
If all five processes are running, then Hadoop is up and running.

Lab 2 : HDFS Lab
A. Create an input and output directory under hdfs for all input and output files
hadoop fs -mkdir /input

hadoop fs -mkdir /output
B. Check directories
hadoop fs -ls /
C. Copy files from local system to hdfs and check if the file is copied
hadoop fs -copyFromLocal /home/user/lab/data/txns /input

hadoop fs -copyFromLocal /home/not root/lab/data/custs /input
hadoop fs -ls /input
D. Go to datan1 and check how the file are split and multiple blocks are stored [Hint: Check
the size of the files in the folder called current, where the blk files would be stored]
III. Lab 3 : Map Reduce Word Count

A. Switch the workspace to a known folder.
B. Open eclipse and create a new Java Project called MRLab
C. Hint : File-> New-> Others-> Java Project

D. Create a package com.evenkat under src folder under project MRLab
E. Add the Hadoop jar files to the project

Hint : Right click on MRLab->Properties->Java Build Path-> Add External Jars
All Jar files under (d:\software\hadoop-1.0.3 and d:\software\hadoop-1.0.3\lib)
F. Create a class called WordCountTesting
G. The packages to be imported are
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import java.io.IOException;
import java.util.StringTokenizer;
H. Map Reduce Program code

Note: This would be inside the class WordCountTesting
public static class MyMapper extends Mapper<LongWritable, Text, Text,
IntWritable> {
private Text word = new Text();
public void map(LongWritable key, Text value, Context context ) throws

IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write( word, new IntWritable( 1) );
}
}
} //end of MyMapper class
public static class MyReducer extends Reducer<Text, IntWritable, Text,

IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write( key, new IntWritable(sum) );

}
} //end of MyReducer class
I. Driver Code [Note that the driver
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser( conf, args
).getRemainingArgs();
Job job = new Job(conf, "Word Counter");
job.setJarByClass( WordCountTesting.class );
job.setMapperClass( MyMapper.class );
job.setReducerClass( MyReducer.class );
job.setMapOutputKeyClass( Text.class );
job.setMapOutputValueClass( IntWritable.class );
job.setOutputKeyClass( Text.class );
job.setOutputValueClass( IntWritable.class );
FileInputFormat.addInputPath( job, new Path( otherArgs[0] ) );

FileOutputFormat.setOutputPath( job, new Path( otherArgs[1] ) );
System.exit( job.waitForCompletion( true ) ? 0 : 1 );

}
J. Create the jar file
Right Click on MRLab-> export-> java -> jar file (Give name WordCount.jar). DO NOT
CREATE A MAIN ENTRY CLASS NOW.
K. Transfer the jar file to VM under / home/user/lab/programs
Hint : Use WinSCP software to ftp the jar file to linux VM
L. Create a words file under/home/user/lab/data directory and write few line of text in the
file (Hint:copy some paragraph from your favourite website)
M. Copy the words file to hdfs under input folder
N. Go to /home/user/lab/programs and run the job
O. hadoop jar WordCount.jar com.evenkat.WordCountTesting /input/words
/output/wcount
P. Check output
Hadoop fs -cat /output/wcount/part-r-00000
IV. Lab 4 : Map Reduce Lab Total Transactions by Each Product
Input File : txns
Txnid, date, custid, amount, product category, sub category, city, state, credit or cash
Run the map reduce program as follows:
hadoop jat TxnSorting.jar comevenkat.SortingDriver input/txns output/pbyamt.

Check the output file
Hadoop fs -ls output/pbyamt
Hadoop fs -cat output/pbyamt/part-r-00000
Lab 5 : HDFS Monitoring

A. HDFS Filesystem statistics
Hadoop dfsadmin -report
Gives you a detailed report of the hdfs system including

Total capacity allocated, used, available
No.of files, block
Total number of under replicated or missing blocks
B. Checking health of files in HDFS

Gives you detailed report of hdfs files ( all files or a specific files)
Hadoop fsck /
Hadoop fsck / user/user/input/txns -files -blocks
Gives you detailed report of the file that is specified

C. Total number of blocks and their size
D. Under replicated or missing blocks, if any
E. HDFS Web UI
F. Open your browser and enter the following url

G. http:// <ip address of the VM>:50070/
V. Lab 6: Job Tracker Monitoring

Open your browser and enter the following url
http:// <ip address of the VM>:50030/

Hadoop Administrator Training - Lab Hand Book

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hadoop Administrator Training - Lab Hand Book

Caricato da

Copyright:

Formati disponibili

Hadoop Administrator Training Lab Hand Book

List of Lab Exercises

The following sub-directories should be created under /home/user

Lab For all lab activities

II. Lab 1 : Hadoop Configuration

o Untar Hadoop files into software folder

tar -xvf ../../Downloads/hadoop-1.0.3.tar.gz [space after tar]

Save and exit .bash_profile.

run following command

Check the following versions

b. Create the directories

Change permission for the following directories under lab/hdfs

chmod 755 datan1

Create the following directories under lab/mapred

c. Configuring pseudo-distributed mode

d. Format the namenode

Edit Hadoop-env.sh and set JAVA_HOME

Go to bin directory under HADOOP_HOME and type the following command

Run jps and verify the following process running

f. Start Map Reduce services

Run jps and verify the following processes running.

If all five processes are running, then Hadoop is up and running.

hadoop fs -mkdir /input

hadoop fs -copyFromLocal /home/user/lab/data/txns /input

III. Lab 3 : Map Reduce Word Count

C. Hint : File-> New-> Others-> Java Project

E. Add the Hadoop jar files to the project

H. Map Reduce Program code

private Text word = new Text();

public void map(LongWritable key, Text value, Context context ) throws

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

public static class MyReducer extends Reducer<Text, IntWritable, Text,

for (IntWritable val : values) {

context.write( key, new IntWritable(sum) );

Configuration conf = new Configuration();

Job job = new Job(conf, "Word Counter");

FileInputFormat.addInputPath( job, new Path( otherArgs[0] ) );

System.exit( job.waitForCompletion( true ) ? 0 : 1 );

hadoop jat TxnSorting.jar comevenkat.SortingDriver input/txns output/pbyamt.

Lab 5 : HDFS Monitoring

Hadoop dfsadmin -report

Gives you a detailed report of the hdfs system including

B. Checking health of files in HDFS

Gives you detailed report of the file that is specified

F. Open your browser and enter the following url

V. Lab 6: Job Tracker Monitoring

Potrebbero piacerti anche