Sei sulla pagina 1di 59

Ex.No: 1.

a)
Data structures in Java- Stack

OBJECTIVE: To implement Stack in java

DESCRIPTION:

Stack is a subclass of Vector that implements a standard last-in, first-out stack.Stack


only defines the default constructor, which creates an empty stack. Stack includes all
the methods defined by Vector, and adds several of its own.Java provides predefined
stack class in java.util package with push(),pop(),peek() etc.,methods.

PROGRAM:

importjava.util.*;
public class stackpro {
public static void main(String[] args) {
Stack<Integer> s=new Stack<Integer>();
Scanner sc=new Scanner(System.in);
inti;
do{
System.out.println("1:push");
System.out.println("2:pop");
System.out.println("3:peek");
System.out.println("4:search");
System.out.println("5:isEmpty");
System.out.println("Enter the choice");
i=sc.nextInt();
switch(i)
{
case 1:
System.out.println("Enter the element:");
int x=sc.nextInt();
s.push(x);
System.out.println("stack is "+s);
break;
case 2:
int y=s.pop();
System.out.println("the value popedis"+y);
break;
case 3:
int z=s.peek();
System.out.println("The peek element is"+z);
break;
case 4:
System.out.println("Enter the element to be searched");
int b=sc.nextInt();
int a=s.search(b);
if(a==-1)
System.out.println("Element is not available");
else
System.out.println("Element is available in index "+a);
break;
case 5:
System.out.println("The stack is empty: "+s.empty());
1
break;
case 6:
System.exit(0);
}
}while(i<=6);
}
}

OUTPUT:

1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
1
Enter the element:
10
stack is [10]
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
1
Enter the element:
20
stack is [10, 20]
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
1
Enter the element:
30
stack is [10, 20, 30]
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
3
The peek element is30
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
2
the value poped is30
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
4
Enter the element to be searched
2
20
The index is 1
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
5
The stack is empty: false
1:push
2:pop
3:peek
4:search
5:isEmpty
Enter the choice
6

VIVA QUESTIONS:

1. Stack class is available in which package?


Ans: Java.lang.*;

2. What is the purpose of peek () method?


Ans: Peek () method returns top value of the stack.

3. What is the purpose of pop () method?


Ans: Pop()method returns the top value and then deletes the top value.

3
Ex.No: 1.b)
LinkedList

OBJECTIVE: To implement LinkedList datastruture.


.

DESCRIPTION:

The The LinkedList class extends AbstractSequentialList and implements the List
interface. It provides a linked-list data structure.
PROGRAM:

import java.util.*;
public class LinkedListDemo {

public static void main(String args[]) {


// create a linked list
LinkedList ll = new LinkedList();

// add elements to the linked list


ll.add("F");
ll.add("B");
ll.add("D");
ll.add("E");
ll.add("C");
ll.addLast("Z");
ll.addFirst("A");
ll.add(1, "A2");
System.out.println("Original contents of ll: " + ll);

// remove elements from the linked list


ll.remove("F");
ll.remove(2);
System.out.println("Contents of ll after deletion: " + ll);

// remove first and last elements


ll.removeFirst();
ll.removeLast();
System.out.println("ll after deleting first and last: " + ll);

// get and set a value


Object val = ll.get(2);
ll.set(2, (String) val + " Changed");
System.out.println("ll after change: " + ll);
}
}
4
OUTPUT:

Original contents of ll: [A, A2, F, B, D, E, C, Z]


Contents of ll after deletion: [A, A2, D, E, C, Z]
ll after deleting first and last: [A2, D, E, C]
ll after change: [A2, D, E Changed, C]

VIVA QUESTIONS:

1. How can you insert elements in to Linked List?


Ans:By using add()
2. How can you insert element in to Linked as a first element?
Ans:addFirst()
3.How can you add elements of one collection to List?
Ans:addAll()

5
Ex.No: 1.c)

SET

OBJECTIVE: To implement the Set datastructure.

DESCRIPTION:

A Set is a Collection that cannot contain duplicate elements. It models the


mathematical set abstraction. The Set interface contains only methods inherited from
Collection and adds the restriction that duplicate elements are prohibited.

PROGRAM:

public class Set {


public static void main(String[] args) {
LinkedHashSet<String>lset=new LinkedHashSet<String>();
lset.add("pratyusha");
lset.add("pratyusha");//set does not allow duplicate values.
lset.add("bindu");
lset.add("aruna");
for(String s:lset)//advanced for loop i.e., iterator.
{
System.out.println(s);
}
System.out.println(lset);
TreeSet<String>tset=new TreeSet<String>();//sorted order
tset.add("praneeth");
tset.add("anuradha");
tset.add("pratyusha");
System.out.println(tset);
TreeSet<Integer> set=new TreeSet<Integer>();
set.add(10);
set.add(100);
set.add(90);
set.add(18);
System.out.println(set);
HashSet<String>hset=new HashSet<String>();//random order
hset.add("pratyusha");
hset.add("anuradha");
hset.add("srinivas");
hset.add("bindu");
hset.add("vineela");
hset.add("jyothsna");
System.out.println(hset);
LinkedHashSet<Integer> a=new LinkedHashSet<Integer>();
a.add(14);
a.add(18);
a.add(28);
a.add(35);
System.out.println(a.contains(14));//contains returns boolean value
int sum=0;
6
for(Integer i:a)
{
sum=sum+i;
}
System.out.println(sum);
}
}

OUTPUT:
[pratyusha, bindu, aruna]
[anuradha, praneeth, pratyusha]
[10, 18, 90, 100]
[jyothsna, vineela, anuradha, srinivas, bindu, pratyusha]
true
95

VIVA QUESTIONS:

1. What interfaces are implemented by the HashSet Class ? Which is the


superclass of HashSet Class?
HashSet Class implements three interfaces that is Serializable ,Cloneable and
Set interfaces.
AbstractSet is the superclass of HashSet Class .
2. Difference between HashSet and TreeSet?
Ordering: HashSet stores the object in random order . There is no guarantee
that the element we inserted first in the HashSet will be printed first in the
output .Elements are sorted according to the natural ordering of its elements in
TreeSet. If the objects can not be sorted in natural order than use compareTo()
method to sort the elements of TreeSet object .
3. When to prefer TreeSet over HashSet
1. Sorted unique elements are required instead of unique elements.The sorted list
given by TreeSet is always in ascending order.
2. TreeSet has greater locality than HashSet.If two entries are near by in the
order , then TreeSet places them near each other in data structure and hence in
memory, while HashSet spreads the entries all over memory regardless of the keys
they are associated to.

7
Ex.No: 1.d)
Map

OBJECTIVE: To implement Map Datastructure

DESCRIPTION:

Map contains values on the basis of key i.e. key and value pair. Each key and value
pair is known as an entry. Map contains only unique keys.Map is useful if you have
to search, update or delete elements on the basis of key.

PROGRAM:

importjava.util.*;
public class map {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);

TreeMap<String,Double>tmap=new TreeMap<String,Double>();//sorted order


tmap.put("13a91a0514",80.6);
tmap.put("13a91a0528",82.6);
tmap.put("13a91a0518",81.6);
tmap.put("13a91a0535",83.6);
tmap.put("13a91a0535",83.6);//values do not repeat
System.out.println(tmap);

HashMap<String,Double>hmap=new HashMap<String,Double>();//random order


hmap.put("13a91a0514",80.6);
hmap.put("13a91a0518",81.6);
hmap.put("13a91a0535",83.6);
hmap.put("13a91a0528",82.6);
System.out.println(hmap);
LinkedHashMap<String,Double>lmap=new LinkedHashMap<String,Double>();//no
//change
lmap.put("13a91a0514",80.6);
lmap.put("13a91a0518",81.6);
lmap.put("13a91a0535",83.6);
lmap.put("13a91a0528",82.6);
System.out.println(lmap);

//taking input from the user


System.out.println("How many elements are there");
int no=sc.nextInt();
System.out.println("Enter"+no+" keys and values");
TreeMap<Integer,String> t=new TreeMap<Integer,String>();
for(inti=0;i<no;i++)
{
8
int key=sc.nextInt();
String value=sc.next();
t.put(key, value);
}
System.out.println(t);
//advanced for loop
for(Map.Entry<Integer,String> e:t.entrySet())
{
System.out.println(e.getKey());
System.out.println(e.getValue());
}
}
}
OUTPUT:

{13a91a0514=80.6, 13a91a0518=81.6, 13a91a0528=82.6,


13a91a0535=83.6}
{13a91a0514=80.6, 13a91a0518=81.6, 13a91a0528=82.6,
13a91a0535=83.6}
{13a91a0514=80.6, 13a91a0518=81.6, 13a91a0535=83.6,
13a91a0528=82.6}
How many elements are there
4
Enter4 keys and values
1
14
2
18
3
35
4
28
{1=14, 2=18, 3=35, 4=28}
1
14
2
18
3
35
4
28

9
VIVA QUESTIONS:

1. Sort a Map on the keys?


Ans: One way is to put Map.Entry into a list, and sort it using a comparator that
sorts the value.
2. What are the commons ways for implementing Map?
Ans: The 4 commonly used implementations of Map in Java - HashMap, TreeMap,
Hashtable and LinkedHashMap.

 HashMap is implemented as a hash table, and there is no ordering on keys or


values.
 TreeMap is implemented based on red-black tree structure, and it is ordered by
the key.
 LinkedHashMap preserves the insertion order
 Hashtable is synchronized, in contrast to HashMap. It has an overhead for
synchronization.

3. Difference between HashMap, TreeMap, and Hashtable


Ans: There are three main implementations of Map interface in
Java: HashMap, TreeMap, and Hashtable.

The most important differences include:

1. The order of iteration. HashMap and Hashtable make no guarantees as to the


order of the map; in particular, they do not guarantee that the order will remain
constant over time. But TreeMap will iterate the whole entries according the
"natural ordering" of the keys or by a comparator.
2. key-value permission. HashMap allows null key and null values (Only one null
key is allowed because no two keys are allowed the same). Hashtable does not
allow null key or null values. If TreeMap uses natural ordering or its comparator
does not allow null keys, an exception will be thrown.

10
3. Synchronized. Only Hashtable is synchronized, others are not. Therefore, "if a
thread-safe implementation is not needed, it is recommended to use HashMap in
place of Hashtable."

11
Ex.No: 1.e)
GENERIC PROGRAMMING

OBJECTIVE: To implement Generic Concepts

DESCRIPTION:
Java Generic methods and generic classes enable programmers to specify, with a
single method declaration, a set of related methods, or with a single class declaration,
a set of related types, respectively. Generics also provide compile-time type safety
that allows programmers to catch invalid types at compile time.

Example.

PROGRAM:

Class A<T>
{
T x;
void add(T x)
{
this.x=x;
}
T get()
{
return x;
}
}
public class gen
{
public static void main(String[] args) {
A<Integer> o=new A<Integer>();
o.add(2);
System.out.println(o.get());

12
A<String> o1=new A<String>();
o1.add(“neelima”);
System.out.println(o1.get());
}
}

OUTPUT:

2
Neelima

VIVA QUESTIONS:

1. What are Generics?

Ans:Generics are used to create Generic Classes and Generic methods which
can work with different Types(Classes).

2. How do you declare a Generic Class?


Ans:The declaration of class:Instead of T, We can use any valid identifier.

class MyListGeneric<T>

3. How can we restrict Generics to a subclass of particular class?

Ans: In MyListGeneric, Type T is defined as part of class declaration. Any

Java Type can be used a type for this class. If we would want to restrict the types
allowed for a Generic Type, we can use a Generic Restrictions.

13
Ex.No: 1.f)
Serialization

OBJECTIVE: To implement serialization

DESCRIPTION:

Java provides a mechanism, called object serialization where an object can be


represented as a sequence of bytes that includes the object's data as well as
information about the object's type and the types of data stored in the object.

After a serialized object has been written into a file, it can be read from the file and
deserialized that is, the type information and bytes that represent the object and its
data can be used to recreate the object in memory.

PROGRAM:

Student.java:

import java.io.*;
public class Student implements Serializable
{
int no;
String name;
}

SeriEx.java:

import java.io.*;
public class SeriEx
{
Public static voidmain(String args[])throws Exception
{
Student S1=new Student();
S1.no=12;
S1.name=”CSEA”;
Objectoutputstream out=new objectoutputStream(new
14
FileOutputStream(“D:/serex.ser”));
Out.writeobject(S1);
}
}

Deser.java:

import java.io.*;
public class Deser
{
Public static void main(String[] args)throws Exception
{
Student S1=null;
FileInputStream fileIn=new FileIutputStream(“D:/serex.ser”);
ObjectInputStream in=new ObjectInputStream(fileIn);
S1=(Student)in.readobject();
System.out.println(“Deserialization student……”);
System.out.println(“Name:”+S1.name);
System.out.println(“Number:”+S1.no);
}}

OUTPUT:

12
CSEA

VIVA QUESTIONS:

1. How to make a Java class Serializable?


Making a class Serializable in Java is very easy, Your Java class just needs to
implements java.io.Serializable interface and JVM will take care of
serializing object in default format.
2. How many methods Serializable has? If no method then what is the purpose
of Serializable interface?

Serializable interface exists in java.io package and forms core of java


serialization mechanism. It doesn't have any method and also called Marker
Interface in Java. When your class implements java.io.Serializable interface it
becomes Serializable in Java and gives compiler an indication that use Java
Serialization mechanism to serialize this object.

15
Ex.No: 1.g)
Queue

OBJECTIVE: To implement Queue

DESCRIPTION:

A Queue is a collection for holding elements prior to processing. Besides


basic Collection operations, queues provide additional insertion, removal, and
inspection operations.

The Queueinterface follows.

public interface Queue<E> extends Collection<E> {


E element();
boolean offer(E e);
E peek();
E poll();
E remove();
}

PROGRAM:

import java.util.*;
class TestCollection12{
public static void main(String args[]){
PriorityQueue<String> queue=new PriorityQueue<String>();
queue.add("Amit");
queue.add("Vijay");

16
queue.add("Karan");
queue.add("Jai");
queue.add("Rahul");
System.out.println("head:"+queue.element());
System.out.println("head:"+queue.peek());
System.out.println("iterating the queue elements:");
Iterator itr=queue.iterator();
while(itr.hasNext()){
System.out.println(itr.next()); }
queue.remove();
queue.poll();
System.out.println("after removing two elements:");
Iterator<String> itr2=queue.iterator();
while(itr2.hasNext()){
System.out.println(itr2.next());
} } }

OUTPUT:

head:Amit
head:Amit
iterating the queue elements:
Amit
Jai
Karan
Vijay
Rahul
after removing two elements:
Karan
Rahul
Vijay

VIVA QUESTIONS:

1. Queue is available in which package?


Ans:Java.util.*;
2. Queue reterives elements in which order?

Ans: First in First out Order

17
Ex.No: 1.h)
Wrapper Classes

OBJECTIVE: To implement Wrapper Classes

DESCRIPTION:

Wrapper class in java provides the mechanism to convert primitive into object and
object into primitive.

Since J2SE 5.0, autoboxing and unboxing feature converts primitive into object and
object into primitive automatically. The automatic conversion of primitive into object
is known and autoboxing and vice-versa unboxing.

public interface Queue<E> extends Collection<E> {


E element();
boolean offer(E e);
E peek();
E poll();
E remove();
}

PROGRAM:

public class WrapperExample1{


public static void main(String args[])
{
//Converting int into Integer
int a=20;
18
Integer i=Integer.valueOf(a);//converting int into Integer
Integer j=a;//autoboxing, now compiler will write Integer.valueOf(a) internally
System.out.println(a+" "+i+" "+j);
}}

OUTPUT:

20 20 20

VIVA QUESTIONS:

1. What is wrapper class?


Ans: Converting primitive data type in to object.
2. What is the wrapper class for boolean data type?

Ans: Boolean is a predefine Wrapper class.

19
Ex.No: 2
Perform setting up and Installing Hadoop

OBJECTIVE: Installing Hadoop

DESCRIPTION:

Hadoop can be run in 3 different modes. Different modes of Hadoop are

Standalone Mode

 Default mode of Hadoop


 HDFS is not utilized in this mode.Local file system is used for input and
output
 Used for debugging purpose
 No Custom Configuration is required in 3 hadoop(mapred-site.xml,core-
site.xml, hdfs-site.xml) files.
 Standalone mode is much faster than Pseudo-distributed mode.

Pseudo Distributed Mode(Single Node Cluster)

 Configuration is required in given 3 files for this mode


 Replication factory is one for HDFS.
 Here one node will be used as Master Node / Data Node / Job Tracker / Task
Tracker
 Used for Real Code to test in HDFS.
 Pseudo distributed cluster is a cluster where all daemons are
running on one node itself.

Fully distributed mode (or multiple node cluster)

 This is a Production Phase


 Data are used and distributed across many nodes.

20
 Different Nodes will be used as Master Node / Data Node / Job Tracker / Task
Tracker

PROGRAM:

Installation of Hadoop

Step 1: Verifying JAVA Installation

Java must be installed on your system before installing Hive. Let us verify java
installation using the following command:

$ java –version

If Java is already installed on your system, you get to see the following response:

java version "1.7.0_71"

Java(TM) SE Runtime Environment (build 1.7.0_71-b13)

Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

If java is not installed in your system, then follow the steps given below for installing
java.
Installing Java

Step I:

Download java (JDK <latest version> - X64.tar.gz) by visiting the following link
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-
1880260.html.
Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system.

Step II:

Generally you will find the downloaded java file in the Downloads folder. Verify it
and extract the jdk-7u71-linux-x64.gz file using the following commands.

$ cd Downloads/

$ ls

jdk-7u71-linux-x64.gz

$ tar zxf jdk-7u71-linux-x64.gz

$ ls

jdk1.7.0_71 jdk-7u71-linux-x64.gz
21
Step III:

To make java available to all the users, you have to move it to the location
“/usr/local/”. Open root, and type the following commands.

$ su

password:
# mv jdk1.7.0_71 /usr/local/
# exit

Step IV:

For setting up PATH and JAVA_HOME variables, add the following commands to
~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=$PATH:$JAVA_HOME/bin

Now apply all the changes into the current running system.

$ source ~/.bashrc

Step V:

Use the following commands to configure java alternatives:

# alternatives --install /usr/bin/java java usr/local/java/bin/java 2


# alternatives --install /usr/bin/javac javac usr/local/java/bin/javac 2
# alternatives --install /usr/bin/jar jar usr/local/java/bin/jar 2
# alternatives --set java usr/local/java/bin/java
# alternatives --set javac usr/local/java/bin/javac
# alternatives --set jar usr/local/java/bin/jar

Now verify the installation using the command java -version from the terminal as
explained above.

Step 2: Verifying Hadoop Installation

Hadoop must be installed on your system before installing Hive. Let us verify the
Hadoop installation using the following command:

$ hadoop version

Compiled by hortonmu on 2013-10-07T06:28Z

Compiled with protoc 2.5.0


From source with checksum 79e53ce7994d1628b240f09af91e1af4
22
If Hadoop is not installed on your system, then proceed with the following steps:
Downloading Hadoop
Download and extract Hadoop 2.4.1 from Apache Software Foundation using the
following commands.

$ su

password:

# cd /usr/local

# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/

hadoop-2.4.1.tar.gz
# tar xzf hadoop-2.4.1.tar.gz

# mv hadoop-2.4.1/* to hadoop/

# exit

Installing Hadoop in Pseudo Distributed Mode

The following steps are used to install Hadoop 2.4.1 in pseudo distributed mode.

Step I: Setting up Hadoop

You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export

PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Now apply all the changes into the current running system.

$ source ~/.bashrc
23
Step II: Hadoop Configuration

You can find all the Hadoop configuration files in the location
“$HADOOP_HOME/etc/hadoop”. You need to make suitable changes in those
configuration files according to your Hadoop infrastructure.

$ cd $HADOOP_HOME/etc/Hadoop

In order to develop Hadoop programs using java, you have to reset the java
environment variables in hadoop-env.sh file by replacing JAVA_HOME value with
the location of java in your system.

export JAVA_HOME=/usr/local/jdk1.7.0_71

Given below are the list of files that you have to edit to configure Hadoop.
core-site.xml

The core-site.xml file contains information such as the port number used for Hadoop
instance, memory allocated for the file system, memory limit for storing the data, and
the size of Read/Write buffers.
Open the core-site.xml and add the following properties in between the
<configuration> and </configuration> tags.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

The hdfs-site.xml file contains information such as the value of replication data, the
namenode path, and the datanode path of your local file systems. It means the place
where you want to store the Hadoop infra.

Let us assume the following data.


dfs.replication (data replication value) = 1
(In the following path /hadoop/ is the user name.
hadoopinfra/hdfs/namenode is the directory created by hdfs file system.)
namenode path = //home/hadoop/hadoopinfra/hdfs/namenode
(hadoopinfra/hdfs/datanode is the directory created by hdfs file system.)
datanode path = //home/hadoop/hadoopinfra/hdfs/datanode

Open this file and add the following properties in between the <configuration>,
24
</configuration> tags in this file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value >
</property>
</configuration>

Note: In the above file, all the property values are user-defined and you can make
changes according to your Hadoop infrastructure.

yarn-site.xml
This file is used to configure yarn into Hadoop. Open the yarn-site.xml file and add
the following properties in between the <configuration>, </configuration> tags in this
file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
mapred-site.xml

This file is used to specify which MapReduce framework we are using. By default,
Hadoop contains a template of yarn-site.xml. First of all, you need to copy the file
from mapred-site,xml.template to mapred-site.xml file using the following command.
$ cp mapred-site.xml.template mapred-site.xml
Open mapred-site.xml file and add the following properties in between the
<configuration>, </configuration> tags in this file.

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Verifying Hadoop Installation


The following steps are used to verify the Hadoop installation.
Step I: Name Node Setup
25
Set up the namenode using the command “hdfs namenode -format” as follows.
$ cd ~
$ hdfs namenode -format
The expected result is as follows.
10/24/14 21:30:55 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost/192.168.1.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.4.1
...
...
10/24/14 21:30:56 INFO common.Storage: Storage directory
/home/hadoop/hadoopinfra/hdfs/namenode has been successfully formatted.
10/24/14 21:30:56 INFO namenode.NNStorageRetentionManager: Going to
retain 1 images with txid >= 0
10/24/14 21:30:56 INFO util.ExitUtil: Exiting with status 0
10/24/14 21:30:56 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/192.168.1.11
************************************************************/

Step II: Verifying Hadoop dfs


The following command is used to start dfs. Executing this command will start your
Hadoop file system.
$ start-dfs.sh
The expected output is as follows:
10/24/14 21:37:56
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop-2.4.1/logs/hadoop-
hadoop-namenode-localhost.out
localhost: starting datanode, logging to /home/hadoop/hadoop-2.4.1/logs/hadoop-
hadoop-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
Step IV: Accessing Hadoop on Browser
The default port number to access Hadoop is 50070. Use the following url to get
Hadoop services on your browser.
http://localhost:50070/Hadoop Browser
Step V: Verify all applications for cluster
The default port number to access all applications of cluster is 8088. Use the
following url to visit this service.

http://localhost:8088/All

Application

26
VIVA QUESTIONS

1. What are the three installation modes of Hadoop installation?


 Stand alone mode
 Psuedo distributed mode
 Full distributed mode

2. why SSH Setup and Key Generation

SSH setup is required to do different operations on a cluster such as starting,


stopping, distributed daemon shell operations. To authenticate different users of
Hadoop, it is required to provide public/private key pair for a Hadoop user and share
27
it with different users.

3. What are the features of hdfs?

 It is suitable for the distributed storage and processing.


 Hadoop provides a command interface to interact with HDFS.
 The built-in servers of namenode and datanode help users to easily check the
status of cluster.
 Streaming access to file system data.
 HDFS provides file permissions and authentication.

28
Ex.No: 3)
File Management Tasks in Hadoop

OBJECTIVE: To implement Adding, Reteriving, Deleting files and directories

DESCRIPTION:

The File System (FS) shell includes various shell-like commands that directly interact
with the Hadoop Distributed File System (HDFS) as well as other file systems that
Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others.The following
commands are used for interacting with HDFS.

cat

Usage: hdfs dfs -cat URI [URI ...]

Copies source paths to stdout.

Example:

 hdfs dfs -cat hdfs://nn1.example.com/file1


hdfs://nn2.example.com/file2
 hdfs dfs -cat file:///file3 /user/hadoop/file4

Exit Code:

Returns 0 on success and -1 on error.

chgrp

Usage: hdfs dfs -chgrp [-R] GROUP URI [URI ...]

Change group association of files. The user must be the owner of files, or else a super-
user. Additional information is in the Permissions Guide.

Options

 The -R option will make the change recursively through the directory structure.

chmod

Usage: hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI


...]

29
Change the permissions of files. With -R, make the change recursively through the
directory structure. The user must be the owner of the file, or else a super-user.
Additional information is in the Permissions Guide.

Options

 The -R option will make the change recursively through the directory structure.

chown

Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Change the owner of files. The user must be a super-user. Additional information is in
the Permissions Guide.

Options

 The -R option will make the change recursively through the directory structure.

copyFromLocal

Usage: hdfs dfs -copyFromLocal <localsrc> URI

Similar to put command, except that the source is restricted to a local file reference.

Options:

 The -f option will overwrite the destination if it already exists.

copyToLocal

Usage: hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

count

Usage: hdfs dfs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified
file pattern. The output columns with -count are: DIR_COUNT, FILE_COUNT,
CONTENT_SIZE FILE_NAME

The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA,


REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME

30
Example:

 hdfs dfs -count hdfs://nn1.example.com/file1


hdfs://nn2.example.com/file2
 hdfs dfs -count -q hdfs://nn1.example.com/file1

Exit Code:

Returns 0 on success and -1 on error.

cp

Usage: hdfs dfs -cp [-f] URI [URI ...] <dest>

Copy files from source to destination. This command allows multiple sources as well in
which case the destination must be a directory.

Options:

 The -f option will overwrite the destination if it already exists.

Example:

 hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2


 hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

PROGRAM:Interacting with Local File System Commands:


ponny@ubuntu:~$ cat >cseaa
hello how are you

ponny@ubuntu:~$ cat cseaa


hello how are you

ponny@ubuntu:~$ ls
AirPassengers.csv Pictures
big.txt protobuf-2.4.1
classes protobuf-2.4.1.tar.gz
core protobuf-2.5.0
cseaa protobuf-2.5.0.tar.gz
cseblearners Public

31
data10.txt PVP College
Data1.txt R
ponny@ubuntu:~$ clear

ponny@ubuntu:~$ cat >cseaa


hi how are you

ponny@ubuntu:~$ cat cseaa


hi how are you

ponny@ubuntu:~$ mkdir Dps


ponny@ubuntu:~$ cd Dps

ponny@ubuntu:~/Dps$ cd\

ponny@ubuntu:~$ cd Dps

ponny@ubuntu:~/Dps$ mkdir train

ponny@ubuntu:~/Dps$ cd train

ponny@ubuntu:~/Dps/train$ cd\

ponny@ubuntu:~$ ls

AirPassengers.csv pa.txt~
big.txt Pictures
classes protobuf-2.4.1
core protobuf-2.4.1.tar.gz
cseaa protobuf-2.5.0
cseblearners protobuf-2.5.0.tar.gz
data10.txt Public
Data1.txt PVP College
ponny@ubuntu:~$ clear

ponny@ubuntu:~$ cd Dps

ponny@ubuntu:~/Dps$ ls

train
ponny@ubuntu:~/Dps$ cd\

ponny@ubuntu:~$ jps

4520
4662 FsShell
3660 TaskTracker
2832 NameNode
4698 Jps
32
3328 SecondaryNameNode
3412 JobTracker
3079 DataNode

Interacting with Hadoop File System Commands

ponny@ubuntu:~$ hadoop fs –ls

Found 2 items
-rw-r--r-- 1 ponny supergroup 15 2016-08-19 10:32
/user/ponny/hadooplab
drwxr-xr-x - ponny supergroup 0 2016-08-18 15:38
/user/ponny/training

ponny@ubuntu:~$ hadoop fs -mkdir hadoop


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -copyFromLocal csea hadoop


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -cat hadoop\csea


Warning: $HADOOP_HOME is deprecated.

cat: File does not exist: /user/ponny/hadoopcsea


ponny@ubuntu:~$ hadoop fs -cat hadoop/csea

Warning: $HADOOP_HOME is deprecated.

hi this is hadoop lab

ponny@ubuntu:~$ hadoop fs -copyToLocal hadoop/csea cseloc

Warning: $HADOOP_HOME is deprecated.


ponny@ubuntu:~$ cat cseloc

hi this is hadoop lab

ponny@ubuntu:~$

1.create "training" file in local system and copy that file to hdfs directory using
"put" cmd
ponny@ubuntu:~$ cat >training
Hello Welcome to the world of Bigdata

ponny@ubuntu:~$ hadoop fs -put training hadoop


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -ls hadoop


Warning: $HADOOP_HOME is deprecated.
33
Found 2 items
-rw-r--r-- 1 ponny supergroup 22 2016-08-19 10:55
/user/ponny/hadoop/csea
-rw-r--r-- 1 ponny supergroup 38 2016-08-19 11:04
/user/ponny/hadoop/training

ponny@ubuntu:~$ hadoop fs -cat hadoop/training


Warning: $HADOOP_HOME is deprecated.
Hello Welcome to the world of Bigdata

2.create It directory in Hdfs and copy "csea" file to hdfs It directory

ponny@ubuntu:~$ hadoop fs -mkdir IT


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ cat csea


hi this is hadoop lab

ponny@ubuntu:~$ hadoop fs -copyFromLocal csea IT


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -cat IT/csea


Warning: $HADOOP_HOME is deprecated.
hi this is hadoop lab

3.create ece directory in Hdfs and copy training bigdata file from cse hdfs direct
to ece hdfs directory
ponny@ubuntu:~$ hadoop fs -mkdir ECE
Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -cp hadoop/csea ECE


Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -cat ECE/csea


Warning: $HADOOP_HOME is deprecated.
hi this is hadoop lab

mv command:

ponny@ubuntu:~$ cat >mvnew


hi this is about mv command

ponny@ubuntu:~$ cat mvnew


hi this is about mv command

ponny@ubuntu:~$ hadoop fs -copyFromLocal mvnew Hadoop

Warning: $HADOOP_HOME is deprecated.


34
ponny@ubuntu:~$ hadoop fs -mv hadoop/mvnew ECE

Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -ls Hadoop

Warning: $HADOOP_HOME is deprecated.


Found 3 items
-rw-r--r-- 1 ponny supergroup 22 2016-08-19 10:55
/user/ponny/hadoop/csea
-rw-r--r-- 1 ponny supergroup 38 2016-08-19 11:04
/user/ponny/hadoop/training
-rw-r--r-- 1 ponny supergroup 44 2016-08-19 11:04
/user/ponny/hadoop/mvnew
ponny@ubuntu:~$ hadoop fs -ls Hadoop

Warning: $HADOOP_HOME is deprecated


.
Found 2 items
-rw-r--r-- 1 ponny supergroup 22 2016-08-19 10:55
/user/ponny/hadoop/csea
-rw-r--r-- 1 ponny supergroup 38 2016-08-19 11:04
/user/ponny/hadoop/training

4.Copy training bigdata file from ECE HDFS Desktop(local file system)

ponny@ubuntu:~$ hadoop fs -copyToLocal hadoop/training Desktop

Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -copyFromLocal


/home/ponny/Desktop/Cancerpatientchi.R ECE

Warning: $HADOOP_HOME is deprecated.

ponny@ubuntu:~$ hadoop fs -ls ECE

Warning: $HADOOP_HOME is deprecated.


Found 3 items
-rw-r--r-- 1 ponny supergroup 460 2016-08-19 12:19
/user/ponny/ECE/Cancerpatientchi.R
-rw-r--r-- 1 ponny supergroup 22 2016-08-19 11:30
/user/ponny/ECE/csea
-rw-r--r-- 1 ponny supergroup 27 2016-08-19 11:46
/user/ponny/ECE/mvnew
ponny@ubuntu:~$ hadoop fs -rmr IT

35
Warning: $HADOOP_HOME is deprecated.

Deleted hdfs://localhost:54310/user/ponny/IT

VIVA QUESTIONS:

1. What is the purpose of copyFromLocal?

Ans: To copy data from local file system to HDFS

2. What is the purpose of copyToLocal?

Ans: To copy data from HDFS to Local

3. What is the purpose of Hadoop fs –mkdir?

Ans: To create directory in Hadoop file system.

36
Ex.No: 4)
Word Count Map Reduce program

OBJECTIVE: To implement the Fifos using IPC

DESCRIPTION:

MapReduce is a processing technique and a program model for distributed computing


based on java. The MapReduce algorithm contains two important tasks, namely Map
and Reduce. Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs). Secondly, reduce
task, which takes the output from a map as an input and combines those data tuples
into a smaller set of tuples. As the sequence of the name MapReduce implies, the
reduce task is always performed after the map job. WordCount is a simple
application that counts the number of occurrences of each word in a given input
set.This works with a local-standalone, pseudo-distributed or fully-distributed
Hadoop installation

PROGRAM

Driver code:

importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.Text;
37
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

publicclassWordCountDriver {
publicstaticvoid main(String[] args) throws Exception {

String input = "test1.txt";


String output = "out";

// Create a new job


Job job = newJob();

// Set job name to locate it in the distributed environment


job.setJarByClass(WordCountDriver.class);
job.setJobName("Word Count");

// Set input and output Path, note that we use the default input format
// which is TextInputFormat (each record is a line of input)
FileInputFormat.addInputPath(job, newPath(input));
FileOutputFormat.setOutputPath(job, newPath(output));

// Set Mapper and Reducer class


job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// Set Output key and value


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Mapper Class:

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper
extends Mapper<LongWritable, Text, Text, IntWritable>{
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String line = value.toString();
38
String[] words = line.split(" ");
for (String w : words) {
word.set(w);
context.write(word, one);
}
}}

Reducer:

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer
extends Reducer<Text, IntWritable, Text, IntWritable>{
protected void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value:values)
{
sum value.get();
}
context.write(key, new IntWritable(sum));
}
}
OUT PUT:

Input File:
Welcome every1.
Welcome to Hadoop lab.
Today we are going to work on Hadoop MapReduce concept.

Output File:
MapReduce 1
Today 1
Welcome 2
are 1
concept. 1
every1 1.
going 1
Hadoop 2
lab. 1
on 1
to 2
we 1
work 1

39
VIVA QUESTIONS:

1. Which method is used for writing mapper logic()?


Ans:Protected void map()

2. Which method is used for writing reducer logic()?


Ans:Protected void reduce()

3.What are the various Hadoop data types?


Ans:LongWritable,InteWritable,Text etc.,

Ex.No: 5)
Matrix Multiplication using Map Reduce Approach

OBJECTIVE: To implement the Matrix Multiplcation

DESCRIPTION:

In the map function each input from the dataset is organized to produce a key value
pair such that reducer can do the entire computation of the corresponding output cell.

PROGRAM

Driver code:

importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

publicclassMatrix {
publicstaticvoid main(String[] args) throws Exception {

String input = "test1.txt";


String output = "out";

40
// Create a new job
Job job = newJob();

// Set job name to locate it in the distributed environment


job.setJarByClass(Matrix.class);
job.setJobName("Word Count");

// Set input and output Path, note that we use the default input format
// which is TextInputFormat (each record is a line of input)
FileInputFormat.addInputPath(job, newPath(input));
FileOutputFormat.setOutputPath(job, newPath(output));

// Set Mapper and Reducer class


job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);

// Set Output key and value


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Mapper Class:

public class MatrixMapper extends


Mapper<LongWritable, Text, Text, Text>
{
@Override
protected void map
(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
// input format is ["a", 0, 0, 63]
String[] csv = value.toString().split(",");
String matrix = csv[0].trim();
int row = Integer.parseInt(csv[1].trim());
int col = Integer.parseInt(csv[2].trim());
if(matrix.contains("a"))
{
for (int i=0; i < lMax; i++)
{
String akey = Integer.toString(row) + "," +
Integer.toString(i);
context.write(new Text(akey), value);
}
}
41
if(matrix.contains("b"))
{
for (int i=0; i < iMax; i++)
{
String akey = Integer.toString(i) + "," +
Integer.toString(col);
context.write(new Text(akey), value);
}
}
}
}

Reducer:

public class MatrixReducer extends Reducer<Text, Text, Text, IntWritable> {

@Override
protected void reduce
(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {

int[] a = new int[5];


int[] b = new int[5];
// b, 2, 0, 30
for (Text value : values) {
System.out.println(value);
String cell[] = value.toString().split(",");
if (cell[0].contains("a")) // take rows here
{
int col = Integer.parseInt(cell[2].trim());
a[col] = Integer.parseInt(cell[3].trim());
}
else if (cell[0].contains("b")) // take col here
{
int row = Integer.parseInt(cell[1].trim());
b[row] = Integer.parseInt(cell[3].trim());
}
}
int total = 0;
for (int i = 0; i < 5; i++) {
int val = a[i] * b[i];
total += val;
}
context.write(key, new IntWritable(total));
}
42
}
OUT PUT:

0,0 11878
0,1 14044
0,2 16031
0,3 5964
0,4 15874
1,0 4081
1,1 6914
1,2 8282
1,3 7479
1,4 9647
2,0 6844
2,1 9880
2,2 10636
2,3 6973
2,4 8873
3,0 10512
3,1 12037
3,2 10587
3,3 2934
3,4 5274
4,0 11182
4,1 14591
4,2 10954
4,3 1660
4,4 9981

VIVA QUESTIONS:

1. Which method is used for split the dataset?


Ans:split()

2. Which method is used for writing data into file?


Ans:context.write()

43
Ex.No: 6)
Mines Weather Data using Map Reduce

OBJECTIVE: To implement mines the weather data using mapreduce

DESCRIPTION:

Sensors senses weather data in big text format containing station ID, year, date, time,
temperature, quality etc. from each sensor and store it in single line. Suppose
thousands of data sensors are their, then we have thousands of records with no
particular order. We require only year and maximum tempertaure of particular
quality in that year.

For example:

Input string from sensor:

0029029070999991902010720004+64333+023450FM-12+

000599999V0202501N027819999999N0000001N9-00331+
99999098351ADDGF102991999999999999999999

Here: 1902 is year

44
0033 is temperature

1 is measurement quality (Range between 0 or 1 or 4 or 5 or 9)

Here each mapper takes input key as "byte offset of line" and value as "one weather
sensor read i.e one line". and parse each line and produce intermediate key is "year"
and intermediate value as "temperature of certain measurement qualities" for that
year.

PROGRAM

Driver code:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

public class Weather {

public static void main(String[] args) {


JobConf conf=new JobConf();
Job job;
try {
job = new Job(conf,"WeatherDataExtraction");
job.setJobName("WeatherDataExtraction");
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,new
Path("E:\\Nitin\\Programming\\DATA\\01001.dat\\01001.dat"));
FileOutputFormat.setOutputPath(conf,new
Path("E:\\Nitin\\output20.txt"));
try {
job.waitForCompletion(true);
} catch (ClassNotFoundException | IOException | InterruptedException
45
e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Mapper Class:

Mapper Class

class MaxTemperatureMapper

extends Mapper {

@Override

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();

//In input character string,perticular year occurs from character position

//15 to 19 and it is fix for every input data.

//so make substring to get year from total input string

String year = line.substring(15, 19);

int airTemperature;

//Temperature(including sign character) occurs from character position

46
// 87 to 92 for temperature comparision we needn't required "+ve" sign

//before temp.because +11c=11c we can ignore "+ve" sign

//before Temp.but not "-ve" sign.

//Make substring to get temp from total input string

if (line.charAt(87) == '+') {

// parseInt doesn't like leading plus signs

airTemperature = Integer.parseInt(line.substring(88, 92));

} else {

airTemperature = Integer.parseInt(line.substring(87, 92));

String quality = line.substring(92, 93);

//Temperature quality occurs at character position 93

//and we have to get Temp. qualities of 0 or 1 or 4 or 5 or 9

//so make substring of one character to get temp. quality and matches //it with our
required qualities

//If it matches,then we write perticular year as key and temp. as value //to context
output

if (quality.matches("[01459]")) {

context.write(new Text(year), new IntWritable(airTemperature));

47
}

Reducer:

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{

public void reduce(Text key,Iterable<IntWritable> values,Context context) throws


IOException, InterruptedException
{
Integer max=new Integer(0);
for(IntWritable val:values) {
if (val.get()>max.intValue()) { max=val.get();}
}
context.write(key,new IntWritable(max.intValue()));
}
}
OUT PUT:
1949 111
1955 22

VIVA QUESTIONS:

1. Which method is used for getting max value()?


Ans:Math.max()

2. Which method is used for getting minvalue()?

Ans:Math.min()

48
Ex.No: 7

Pig Latin scripts

OBJECTIVE: To Implement Install and Run Pig then write Pig Latin scripts to sort,
group, join, project, and filter your data.
DESCRIPTION:

Apache Pig is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs, coupled with infrastructure for
evaluating these programs. The salient property of Pig programs is that their
49
structure is amenable to substantial parallelization, which in turns enables them to
handle very large data sets.At the present time, Pig's infrastructure layer consists of
a compiler that produces sequences of Map-Reduce programs, for which large-scale
parallel implementations already exist (e.g., the Hadoop subproject). Pig's language
layer currently consists of a textual language called Pig Latin, which has the following
key properties:

 Ease of programming. It is trivial to achieve parallel execution of


simple, "embarrassingly parallel" data analysis tasks. Complex tasks
comprised of multiple interrelated data transformations are explicitly
encoded as data flow sequences, making them easy to write, understand,
and maintain.
 Optimization opportunities. The way in which tasks are encoded
permits the system to optimize their execution automatically, allowing
the user to focus on semantics rather than efficiency.
 Extensibility. Users can create their own functions to do special-purpose
processing.

Install Apache Pig

After downloading the Apache Pig software, install it in your Linux


environment by following the steps given below.

Step 1
Create a directory with the name Pig in the same directory where the
installation directories of Hadoop, Java, and other software were
installed. (In our tutorial, we have created the Pig directory in the user
named Hadoop).
$ mkdir Pig

Step 2
Extract the downloaded tar files as shown below.
$ cd Downloads/
$ tar zxvf pig-0.15.0-src.tar.gz
$ tar zxvf pig-0.15.0.tar.gz

Step 3
Move the content of pig-0.15.0- src.tar.gz file to the Pig directory
created earlier as shown below.
$ mv pig-0.15.0-src.tar.gz/* /home/Hadoop/Pig/
Configure Apache Pig
After installing Apache Pig, we have to configure it. To configure, we
need to edit two files − bashrc and pig.properties.
.bashrc file

PigLatin Script.

A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);


B = FOREACH A GENERATE name;

50
DUMP B;
(John)
(Mary)
(Bill)
(Joe)

customers.txt

1,Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00
3,kaushik,23,Kota,2000.00
4,Chaitali,25,Mumbai,6500.00
5,Hardik,27,Bhopal,8500.00
6,Komal,22,MP,4500.00
7,Muffy,24,Indore,10000.00

orders.txt

102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060

Self – join

Self-join is used to join a table with itself as if the table were two relations,
temporarily renaming at least one relation.

grunt> customers1 = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING


PigStorage(',')
as (id:int, name:chararray, age:int, address:chararray, salary:int);

grunt> customers2 = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING


PigStorage(',')
as (id:int, name:chararray, age:int, address:chararray, salary:int);

self-join operation on the relation customers, by joining the two relations


customers1 and customers2 as shown below.

grunt> customers3 = JOIN customers1 BY id, customers2 BY id;

Verify the relation customers3 using the DUMP operator as shown below.

grunt> Dump customers3;


Output
It will produce the following output, displaying the contents of the relation
51
customers.

(1,Ramesh,32,Ahmedabad,2000,1,Ramesh,32,Ahmedabad,2000)
(2,Khilan,25,Delhi,1500,2,Khilan,25,Delhi,1500)
(3,kaushik,23,Kota,2000,3,kaushik,23,Kota,2000)
(4,Chaitali,25,Mumbai,6500,4,Chaitali,25,Mumbai,6500)
(5,Hardik,27,Bhopal,8500,5,Hardik,27,Bhopal,8500)
(6,Komal,22,MP,4500,6,Komal,22,MP,4500)
(7,Muffy,24,Indore,10000,7,Muffy,24,Indore,10000)

Inner Join
Inner Join is used quite frequently; it is also referred to as equijoin. An inner join
returns rows when there is a match in both tables.

inner join operation on the two relations customers and orders as shown below.

grunt> coustomer_orders = JOIN customers BY id, orders BY customer_id;


Verification
Verify the relation coustomer_orders using the DUMP operator as shown below.

grunt> Dump coustomer_orders;


Output
will get the following output that will the contents of the relation named
coustomer_orders.

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
Note −

Outer Join: Unlike inner join, outer join returns all the rows from at least one of the
relations. An outer join operation is carried out in three ways −

Left outer join


Right outer join
Full outer join
Left Outer Join
Let us perform left outer join operation on the two relations customers and orders as
shown below.

grunt> outer_left = JOIN customers BY id LEFT OUTER, orders BY customer_id;


Verification
Verify the relation outer_left using the DUMP operator as shown below.

grunt> Dump outer_left;


Output
It will produce the following output, displaying the contents of the relation outer_left.

52
(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)
Right Outer Join
The right outer join operation returns all rows from the right table, even if there are
no matches in the left table.

Let us perform right outer join operation on the two relations customers and orders as
shown below.

grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;


Verification
Verify the relation outer_right using the DUMP operator as shown below.

grunt> Dump outer_right


Output
It will produce the following output, displaying the contents of the relation
outer_right.

(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
Full Outer Join
The full outer join operation returns rows when there is a match in one of the
relations.

grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;


Example
Let us perform full outer join operation on the two relations customers and orders as
shown below.

grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;


Verification
Verify the relation outer_full using the DUMP operator as shown below.

grun> Dump outer_full;


Output
It will produce the following output, displaying the contents of the relation outer_full.

(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
53
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)

VIVA QUESTIONS
1)what is pig in Hadoop?

Ans:Apache open source project which is run on hadoop,provides engine for data
flow in parallel on hadoop.It includes language called pig latin,which is for
expressing these data flow.It includes different operations like joins,sort,filter ..etc
and also ability to write UserDefine Functions(UDF) for proceesing and reaing and
writing.pig uses both HDFS and MapReduce i,e storing and processing.

2)what is differnce between pig and sql?


Ans:Pig latin is procedural version of SQl.pig has certainly similarities,more
difference from sql.sql is a query language for user asking question in query form.sql
makes answer for given but dont tell how to answer the given question.suppose ,if
user want to do multiple operations on tables,we have write maultiple queries and
also use temporary table for storing,sql is support for subqueries but intermediate we
have to use temporary tables,SQL users find subqueries confusing and difficult to
form properly.using sub-queries creates an inside-out design where the first step in
the data pipeline is the innermost query .pig is designed with a long series of data
operations in mind, so there is no need to write the data pipeline in an inverted set of
subqueries or to worry about storing data in temporary tables.

3)How Pig differs from MapReduce


Ans:In mapreduce,groupby operation performed at reducer side and filter,projection
can be implemented in the map phase.pig latin also provides standard-operation
similar to mapreduce like orderby and filters,groupby..etc.we can analyze pig script

54
and know data flows ans also early to find the error checking.pig Latin is much lower
cost to write and maintain thanJava code for MapReduce.

Ex.No: 8
Use Hive to Manage Databases, Tables and Views

OBJECTIVE: Install and Run Hive then use Hive to create, alter, and drop databases, tables,
views, functions, and indexes

DESCRIPTION:
Hive is a data warehouse infrastructure tool to process structured data in
Hadoop. It resides on top of Hadoop to summarize Big Data, and makes
querying and analyzing easy.

Data Base Creation:


hive> create database CSE;
OK
55
hive> show databases;
OK
cs
cse
hive> show databases like 'c*';
OK
cs
cse
cseb
hive> show tables;
OK
cseb
customer
order
hive> create table customer(cid BIGINT,cname STRING,cage INT)
> row format delimited
> fields terminated by ','
> stored as textfile;
OK
hive> use default;
OK
hive> drop database cse cascade;
OK
Time taken: 6.524 seconds
hive> show tables;
OK
customer
hive> LOAD DATA LOCAL
> INPATH '/home/lalitha2/Desktop/deepu.txt'
> OVERWRITE INTO TABLE customer;
OK
hive> select * from customer;
OK
1 A 20
2 B 30
3 C 35
4 D 40
hive> create table Order(oid BIGINT,oname STRING,cid INT)
> row format delimited
> fields terminated by ','
> stored as textfile;
OK
hive> LOAD DATA LOCAL
> INPATH '/home/lalitha2/Desktop/orderdet.txt'
> OVERWRITE INTO TABLE Order;
hive> select * from Order;
OK
101 pendrive 1
102 mouse 2
56
103 laptop 3
104 laptop 4
105 mouse 2
1.Write a query to display cid,oid who are having an order item pendrive
select cid,oid from Order WHERE oname="pendrive";
OK
1 101
2.write a query to display oid,oname which is having cid=2
hive> select oid,oname from Order WHERE cid=2;
OK
102 mouse
105 mouse
3.write a query to display oid of laptop
hive> select oid,oname from Order WHERE cid=2;
OK
102 mouse
105 mouse
4.write a query to display oid of laptop or mouse
hive> select oid from Order WHERE Oname="laptop" OR Oname="mouse";
OK
102
103
104
105
5.write a query to display oid and cid
hive> select oid,cid from Order;
OK
101 1
102 2
103 3
104 4
105 2
1.write a query to display customer name of cid=2 from customer
hive> select cname from customer where cid=2;
OK
B
2.write a query to display customer names whoa re having customer id< 4
hive> select cname from customer where cid<4;
OK
A
B
C
JOINS
hive> select c.cid,c.cname,o.oid,o.oname from customer c join Order o on(c.cid=o.cid);
OK
1 A 101 laptop
2 B 102 cd
3 C 103 pendrive
4 D 104 dd
57
hive> select c.cid,c.cname,o.oid,o.oname from customer c left outer join Order o
on(c.cid=o.oid);
OK
1 A NULL NULL
2 B NULL NULL
3 C NULL NULL
4 D NULL NULL

hive> select c.cid,c.cname,o.oid,o.oitem from customer c right outer join Order o


on(c.cid=o.oid);
OK
NULL NULL 101 pendrive
NULL NULL 102 mouse
NULL NULL 103 laptop
NULL NULL 104 laptop
NULL NULL 105 mouse
hive> select c.cid,c.cname,o.oid,o.oitem from customer c full outer join Order o
on(o.oid=c.cid);
OK
NULL NULL 101 pendrive
NULL NULL 102 mouse
NULL NULL 103 laptop
NULL NULL 104 laptop
NULL NULL 105 mouse
Views:
hive> create view customer_view as
> select c.cid,c.cname,o.oid,o.oname
> from customer c full outer join Order o
> on (c.cid=o.cid);
OK
hive> select * from customer_view;

OK
1 A 101 laptop
2 B 102 cd
3 C 103 pendrive
4 D 104 dd
NULL NULL 105 ddd

VIVA QUESTIONS:

1. What is Hive?

Ans: Hive is a datawarehouse tool for analyzing data.

2. What are the different types of Joins?


Ans:Inner Join,Left Outer Join,Right Outer Join .Full Outer Join
58
3. How can we copy data from local file system to Hive?

Ans:By using “ Load” command

59

Potrebbero piacerti anche