Sei sulla pagina 1di 5

BigDataTraining.

IN
Online Session Notes
Apache Pig Intro & Installation


Hadoop Eco-System Tools
Depends on Hadoop
Hadoop understands only MapReduce
We write it at low level or ask any other tool
to write on our behalf is what An EcoSystem
tool is.
EcoSystem tools simplify our Data
Operations.
Apache Pig Usecase

1/ User Submits Logic in Pig Latin
2/ Run Pig Script (PigLatin)
3/ (Mapreduce Jar(s) Generated by Pig)
4/ Jar Gets Executed on Hadoop
5/ Input & Output <= HDFS
6/ Everything gets executed as an MR Job
on Hadoop


Pig Latin:
Use Pig Latin to communicate logic into
MapRecuce Jar, that is further accepted
by Hadoop to run.
DataFlow Language

o Input
o Operations / Transformations
involved
o Output
Accepts input from HDFS
Stores Output into HDFS
PigLatin involves Relation Names &
Field names
o Relation Name ~ Variable Name
o Relation Name = holds DataSets
o While variable = holds values
Can be communicated via
o Grunt shell (Interactive Shell for Pig
Latin)
o Pig Script
Install Notes:
Head to ,
o http://pig.apache.org/
o http://pig.apache.org/releases.htm
l#Download

o
o Choose any of the mirrors,

o Get into Pig-0.10.1,


o Find the tar.gz file, copy the link
address (or shortcut)
o $cd /data
o $ wget
http://www.bizdirusa.com/mirror
s/apache/pig/pig-0.10.1/pig-
0.10.1.tar.gz
o $ tar zxvf pig-0.10.1.tar.gz
o Should noe see a new dir, pig-
0.10.1
o Now we should set environment
variables,
HADOOP_HOME
PIG_HOME
o Sample PigLatin Script
A = LOAD /file.txt;
DUMP A;
o Here, A is relation Name, contents
of A would be the contents of the
file loaded.
o DUMP is similar to Printf / echo
o Note: Pig Expects <TAB> as default
delimiter while loading data(files)

CLI:

[hadoop@ip-10-139-12-235 data]$ pig
Warning: $HADOOP_HOME is deprecated.


2013-10-30 02:34:15,747 [main] INFO
org.apache.pig.Main - Apache Pig version 0.10.1
(r1426677) compiled Dec 28 2012, 16:46:13
2013-10-30 02:34:15,754 [main] INFO
org.apache.pig.Main - Logging error messages
to: /data/pig_1383100455723.log
2013-10-30 02:34:16,858 [main] INFO
org.apache.pig.backend.hadoop.executionengine
.HExecutionEngine - Connecting to hadoop file
system at: hdfs://localhost:54310
2013-10-30 02:34:18,654 [main] INFO
org.apache.pig.backend.hadoop.executionengine
.HExecutionEngine - Connecting to map-reduce
job tracker at: localhost:54311
grunt> fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2012-
12-04 11:38 /data
grunt> fs -copyFromLocal hadoop/README.txt
/
grunt> fs -ls /
Found 2 items
-rw-r--r-- 1 hadoop supergroup 1366 2013-
10-30 02:35 /README.txt
drwxr-xr-x - hadoop supergroup 0 2012-
12-04 11:38 /data
grunt> A = LOAD '/README.txt';

Potrebbero piacerti anche