Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Downloads Contains all Installable for Hadoop, Hive and Pig which we will download.
B. We need to download the following tar ball installation from the relevant sites.
a) hadoop
b) pig
c) hive
d) sqoop
e) hbase
C. Untar Hadoop jar file
o Go to lab/software in the Terminal window of the VM.
o Browse through the directories and check which subdirectory contains what files
D. Create a new file called .bash_profile [ yes . before the word] in the /home/user
directory.
E. Install OpenJDK in Ubuntu by entering the below command in the Terminal window
sudo apt-get install openjdk-6-jdk
F. To enable editing the file in a text, download winscp [FTP tool for editing and moving
files in the VM]. Type ifconfig in the terminal window to get the IP Address of the VM
and type it in the host file name file along with the username and password and
connect.
Enter the following settings
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
export HADOOP_INSTALL=/home/user/lab/software/hadoop-1.0.3
export PATH=$PATH:$HADOOP_INSTALL/bin
Verify whether variable are defined or not by typing export at command prompt
or env
java version
hadoop version
mkdir namenodep
mkdir datan1
mkdir checkp
mkdir local1
mkdir system
Modify core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
<final>true</final>
</property>
</configuration>
Modify hdfs-site.xml
<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/user/lab/hdfs/namenodep</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/user/lab/hdfs/datan1</value>
<final>true</final>
</property>
<property>
<name>dfs.checkpoint.dir</name>
<value>/home/user/lab/hdfs/checkp</value>
<final>true</final>
</property>
</configuration>
Modify mapred-site.xml
<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:8021</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/user/lab/mapred/local1</value>
<final>true</final>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/user/lab/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>3</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
<final>true</final>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx400m</value>
<!-- Not marked as final so jobs can include JVM debugging options -->
</property>
</configuration>
hadoop namenode format [Note: Ensure that you give a Y and not a y]
Go to namenodep directory and check if any folders have been created verify
which all files have been created
[The exact folders / files is not required as shown in the screen shot. They will
be created once the cluster is started up and not at the beginning]
e. Start HDFS services
Go to conf directory under HADOOP_HOME
B. Check directories
hadoop fs -ls /
C. Copy files from local system to hdfs and check if the file is copied
D. Go to datan1 and check how the file are split and multiple blocks are stored [Hint: Check
the size of the files in the folder called current, where the blk files would be stored]
import java.io.IOException;
import java.util.StringTokenizer;
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write( word, new IntWritable( 1) );
}
}
} //end of MyMapper class
int sum = 0;
sum += val.get();
job.setJarByClass( WordCountTesting.class );
job.setMapperClass( MyMapper.class );
job.setReducerClass( MyReducer.class );
job.setMapOutputKeyClass( Text.class );
job.setMapOutputValueClass( IntWritable.class );
job.setOutputKeyClass( Text.class );
job.setOutputValueClass( IntWritable.class );
L. Create a words file under/home/user/lab/data directory and write few line of text in the
file (Hint:copy some paragraph from your favourite website)
M. Copy the words file to hdfs under input folder
N. Go to /home/user/lab/programs and run the job
O. hadoop jar WordCount.jar com.evenkat.WordCountTesting /input/words
/output/wcount
P. Check output
Hadoop fs -cat /output/wcount/part-r-00000
IV. Lab 4 : Map Reduce Lab Total Transactions by Each Product
Input File : txns
Txnid, date, custid, amount, product category, sub category, city, state, credit or cash
Run the map reduce program as follows:
Hadoop fsck /
Hadoop fsck / user/user/input/txns -files -blocks
E. HDFS Web UI