Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1.HIVE
Hive-Introduction
An SQL-like interface to Hadoop
Hive is a data warehouse infrastructure tool to process structured data in
Hadoop. It resides on top of Hadoop to summarize Big Data, and makes
querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under the
name Apache Hive.
Hive is not
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates
Features of Hive
It stores schema in a database and processed data into HDFS.
It is designed for OLAP.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
1.Hive-Services
The Hive shell is only one of several services that you can run using the hive
command.
Hive> hive –service
Cli-The command line interface to Hive (the shell). This is the default service.
Hiveserver-Runs Hive as a server exposing a Thrift service, enabling access
from a range of clients written in different languages.
Hwi -The Hive Web Interface
Jar-The Hive equivalent to hadoop jar, a convenient way to run Java
applications that includes Hadoop and Hive classes on the classpath
Metastore-By default, the metastore is run in the same process as the Hive
service
Hive architecture
Thrift Client
The Hive Thrift Client makes it easy to run Hive commands from a wide range
of programming languages.
Thrift bindings for Hive are available for C++, Java, PHP,Python, and Ruby
JDBC Driver
Hive provides JDBC driver, defined in the class
org.apache.hadoop.hive.jdbc.HiveDriver.
When configured with a JDBC URI of the form jdbc:hive://host:port/dbname, a
Java application will connect to a Hive server running in a separate process at
the given host and port.
ODBC Driver
The Hive ODBC Driver allows applications that support the ODBC protocol to
connect to Hive.
The Metastore
The metastore is the central repository of Hive metadata.
The metastore is divided into two pieces: a service and the backing store for the
data.
Working of Hive
2.HiveQL
• Hive’s SQL dialect is called HiveQL. Queries are translated to MapReduce jobs
to exploit the scalability of MapReduce.
• Hive QL
– Basic-SQL : Select, From, Join, Group-By
– Equi-Join, Muti-Table Insert, Multi-Group-By
– Batch query
1. Hive Data Types
• Hive primitive data types
• Hive complex data types.
Primitive data types
Numeric Types
• TINYINT (1-byte signed integer)
• SMALLINT (2-byte signed integer)
• INT (4-byte signed integer)
• BIGINT (8-byte signed integer)
• FLOAT (4-byte single precision floating point number)
• DOUBLE (8-byte double precision floating point number)
Date/Time Types
TIMESTAMP,DATE
String Types
STRING,VARCHAR,CHAR
Department of CA @ AVCCE Page 3 of 50
UNIT –V FRAMEWORK
Misc Types
BOOLEAN,BINARY
Hive Complex Data Types
• arrays: ARRAY<data_type>
• maps: MAP<primitive_type, data_type>
• structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>
• union: UNIONTYPE<data_type, data_type, ...>
2. Hive - Built-in Operators
• Relational Operators
• Arithmetic Operators
• Logical Operators
• Complex Operators
Relational Operators
Arithmetic Operators
Logical Operators
Complex Operators
Syntax Example
Date Functions
Syntax Example
String Functions
Syntax Example
Collection Functions
Syntax Example
Syntax Example
Eg:cast('1' as BIGINT)
Example
Hive> CREATE DATABASE [IF NOT EXISTS] userdb;
Verify a databases list
Hive> SHOW DATABASES;
Hive> SHOW DATABASES LIKE 'u.*';
Drop Database Statement
DROP DATABASE [IF EXISTS] database_name
Hive> DROP DATABASE IF EXISTS userdb;
2.Managing –Tables
Syntax
CREATE TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type
[COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Example
hive> create table if not exists employee (eid int,name String,designation String,
salary int)
COMMENT 'Employee detail'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
Displaying tables
hive> show tables;
Displaying structure table ‘employee’
Hive> desc employee;
eid int None
ename string None
Department of CA @ AVCCE Page 8 of 50
UNIT –V FRAMEWORK
Syntax
DROP TABLE [IF EXISTS] table_name;
Example
Hive> drop table emp;
Hive> drop view emp1;
Altering Tables
Syntax
1. ALTER TABLE name RENAME TO new_name
2. ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
3. ALTER TABLE name DROP [COLUMN] column_name
4. ALTER TABLE name CHANGE column_name new_name new_type
5. ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
Example
Renames the table from employee1 to employee
Hive> ALTER TABLE employee1 RENAME TO employee
Rename the column name
Hive> ALTER TABLE employee CHANGE name ename String;
Adds a column named dept to the employee table
Hive> ALTER TABLE employee ADD COLUMNS ( dept STRING COMMENT
'Department name');
Replace column in table
Hive> ALTER TABLE employee REPLACE COLUMNS ( eid INT empid Int);
3.Importing Data
Syntax
Insert command
INSERT OVERWRITE TABLE target as SELECT col1, col2 FROM source;
Example
Hive> insert overwrite table student as select * from student1;
Load statement
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename;
Example
Hive> LOAD DATA LOCAL INPATH '/home/mca/sample.txt‘ OVERWRITE INTO
TABLE employee
4.Select-Where Clause
SELECT statement is used to retrieve the data from a table. WHERE clause
works similar to a condition.
Syntax
SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE
where_condition] [GROUP BY col_list] [HAVING having_condition] [ORDER BY
col_list]] [LIMIT number];
Example
Hive> SELECT * FROM employee WHERE salary>30000;
Hive> select * from employee where salary>12000 and dept='ADMIN';
Hive> SELECT * FROM employee WHERE salary>30000 limit 2;
5.Sorting and Aggregating
Sorting data in Hive can be achieved by use of a standard ORDER BY clause
Example
Hive> SELECT * from employee order by salary;
The GROUP BY clause is used to group all the records in a result set using a
particular collection column.
Department of CA @ AVCCE Page 10 of 50
UNIT –V FRAMEWORK
Example
Hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;
6.Joins
JOIN is a clause that is used for combining specific fields from two tables by
using values common to each one
JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
JOIN
JOIN clause is used to combine and retrieve the records from multiple tables.
Example
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
LEFT OUTER JOIN
The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if
there are no matches in the right table.
Example
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c LEFT OUTER
JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
RIGHT OUTER JOIN
The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even
if there are no matches in the left table.
Example
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER
JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
FULL OUTER JOIN
The HiveQL FULL OUTER JOIN combines the records of both the left and the
right outer tables that fulfil the JOIN condition.
Department of CA @ AVCCE Page 11 of 50
UNIT –V FRAMEWORK
Example
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c FULL OUTER
JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
6.Subqueries
A subquery is a SELECT statement that is embedded in another SQL statement.
Hive has limited support for subqueries, only permitting a subquery in the
FROM clause of a SELECT statement.
Example
SELECT emp_name,salary FROM emp WHERE emp.Joining_year IN (SELECT
dept_start_year FROM department);
7.Views
A view is a sort of “virtual table” that is defined by a SELECT statement.
Views can be used to present data to users in a different way to the way it is
actually stored on disk
Example
hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE
salary>30000;
2. PIG- Latin
1. Introduction
A high-level scripting language (Pig Latin)
Pig Latin, the programming language for Pig provides common data
manipulation operations, such as grouping, joining, and filtering.
Pig generates Hadoop Map Reduce jobs to perform the data flows.
This high-level language for ad-hoc analysis allows developers to inspect HDFS
stored data without the need to learn the complexities of the Map Reduce
framework, thus simplifying the access to the data
The Pig Latin scripting language is not only a higher-level data flow language
It has operators similar to SQL (e.g., FILTER and JOIN) that are translated into a
series of map and reduce functions.
Pig Latin, in essence, is designed to fill the gap between the declarative style of
SQL and the low-level procedural style of MapReduce
History-Pig
In 2006, Apache Pig was developed as a research project at Yahoo, especially to
create and execute MapReduce jobs on every dataset.
In 2007, Apache Pig was open sourced via Apache incubator.
In 2008, the first release of Apache Pig came out.
In 2010, Apache Pig graduated as an Apache top-level project
When Pig & Hive
• Hive is a good choice
– When you want to query the data
– When you need an answer for specific questions
– If you are familiar with SQL
• Pig is a good choice
– For ETL (Extract Transform- Load)
– For preparing data for easier analysis
– When you have long series of steps to perform
Pig Vs Hive
int 8
long 8L
float 5.5F
double 5.5
chararray ‘avcce’
2016-01-
Datetime
22T00:00:00
(1950,21,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)
Storing Data
To store the loaded data in the file system using the store operator
Syntax
STORE Relation_name INTO ' required_directory_path ' [USING function];
Example
Grunt> store record into 'out' using pigstorage(':');
Grunt> cat out;
1950:0:1
1950:21:1
1950:-11:1
1949:111:1
1949:78:1
2. Filtering Data
The FILTER operator is used to select the required tuples from a relation based
on a condition.
Syntax
grunt> Relation2_name = FILTER Relation1_name BY (condition);
Example
Grunt>filter_data = FILTER student_details BY city == 'Chennai';
FOR EACH...GENERATE
FOREACH operator is used to generate specified data transformations based on
the column data.
Syntax
grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required
data);
Department of CA @ AVCCE Page 16 of 50
UNIT –V FRAMEWORK
Example
grunt> foreach_data = FOREACH student_details GENERATE id,age,city;
3. Grouping and Joining Data
JOIN Operator
The JOIN operator is used to combine records from two or more relations
Syntax
grunt> Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;
Example
Grunt> DUMP A;
(2,Tie)
(4,Coat)
(3,Hat)
Grunt> DUMP B;
(Joe,2)
(Hank,4)
(Eve,3)
Grunt> C = JOIN A BY $0, B BY $1;
Grunt> DUMP C;
(2,Tie,Joe,2)
(3,Hat,Eve,3)
(4,Coat,Hank,4)
GROUP Operator
The GROUP operator is used to group the data in one or more relations. It
collects the data having the same key.
Syntax
grunt> Group_data = GROUP Relation_name BY column name;
Example
Grunt> dump record;
(1950,0,1)
(1950,21,1)
(1950,-11,1)
Department of CA @ AVCCE Page 17 of 50
UNIT –V FRAMEWORK
(1949,111,1)
(1949,78,1)
Grunt> grouped_records = group record by year;
Grunt> dump grouped_records;
(1949,{(1949,111,1),(1949,78,1)})
(1950,{(1950,0,1),(1950,21,1),(1950,-11,1)})
Cross Operator
The CROSS operator computes the cross-product of two or more relations
Syntax
grunt> Relation3_name = CROSS Relation1_name, Relation2_name;
Example
Grunt> DUMP A;
(2,Tie)
(4,Coat)
Grunt> DUMP B;
(Joe,2)
(Hank,4)
4. Sorting Data
The ORDER BY operator is used to display the contents of a relation in a sorted
order based on one or more fields.
Syntax
Grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);
Example
Grunt> order_by_data = ORDER student_details BY age ASC;
Grunt> I = CROSS A, B;
grunt> DUMP I;
(2,Tie,Joe,2)
(2,Tie,Hank,4)
(4,Coat,Joe,2)
(4,Coat,Hank,4)
3. ZOOKEEPER
Apache Zookeeper is an effort to develop and maintain an open-source server
which enables highly reliable distributed coordination.
A high-performance coordination service for distributed applications
(naming, configuration management, synchronization, and group Services)
Runs in Java and has bindings for both Java and C
Developed at Yahoo! Research
Started as sub-project of Hadoop, now a top-level Apache project
Features of Zookeeper
Shared hierarchical namespace: consists of znodes (data registers) in
memory
High performance: can be used in large, distributed systems
Reliability: keeps from being a single point of failure
Strict ordered access: sophisticated synchronization primitives can be
implemented at the client
Replication: replicated itself over a sets of hosts called an ensemble
Architecture of ZooKeeper
5. Multi-update
There is another ZooKeeper operation, called multi, which batches together
multiple primitive operations into a single unit that either succeeds or fails in
its entirety.
6.ACLs
A znode is created with a list of ACLs, which determines who can perform
certain operations on it.
ACLs depend on authentication, the process by which the client identifies itself
to ZooKeeper.
There are a few authentication schemes that ZooKeeper provides:
digest
The client is authenticated by a username and password.
sasl
The client is authenticated using Kerberos.
ip
The client is authenticated by its IP address.
Department of CA @ AVCCE Page 21 of 50
UNIT –V FRAMEWORK
7. Implementation
The ZooKeeper service can run in two modes.
In standalone mode, there is a single ZooKeeper server, which is useful for
testing due to its simplicity but provides no guarantees of high-availability or
resilience.
ZooKeeper runs in replicated mode, on a cluster of machines called an
ensemble. ZooKeeper achieves high-availability through replication, and can
provide a service as long as a majority of the machines in the ensemble are up.
Phase 1: Leader election
The machines in an ensemble go through a process of electing a distinguished
member, called the leader. The other machines are termed followers.
Phase 2: Atomic broadcast
All write requests are forwarded to the leader, which broadcasts the update to
the
followers. When a majority have persisted the change, the leader commits the
update, and the client gets a response saying the update succeeded.
8. Consistency
Understanding the basis of ZooKeeper’s implementation helps in
understanding the consistency guarantees that the service makes.
Sequential consistency-Updates from any particular client are applied in the order
that they are sent.
Atomicity-Updates either succeed or fail. This means that if an update fails, no client
will ever see it.
Single system image-A client will see the same view of the system regardless of the
server it connects to. This means that if a client connects to a new server during the
same session, it will not see an older state of the system than the one it saw with the
previous server.
Durability -Once an update has succeeded, it will persist and will not be undone.
This means updates will survive server failures.
Timeliness -The lag in any client’s view of the system is bounded, so it will not be
out of date by more than some multiple of tens of seconds.
9. Sessions
Examples
ZooKeeper Command Line Interface (CLI) is used to interact with the
ZooKeeper ensemble for development purpose.
To perform the following operation
Create znodes
Get data
Watch znode for changes
Set data
Create children of a znode
List children of a znode
Check Status
Remove / Delete a znode
Create Znodes
Syntax
Example
Get Data
It returns the associated data of the znode and metadata of the specified znode
Syntax
get /path
Example
get /FirstZnode
4. HBASE
1. Introduction
HDFS HBase
HDFS is a distributed file system suitable for HBase is a database built on top of the HDFS.
storing large files.
HDFS does not support fast individual HBase provides fast lookups for larger tables
record lookups
It provides only sequential access of data. HBase internally uses Hash tables and
provides random access, and it stores the data
in indexed HDFS files for faster lookups.
Example
HBase RDBMS
HBase is schema-less, it doesn't have the An RDBMS is governed by its schema, which
concept of fixed columns schema; defines describes the whole structure of tables.
only column families.
It is built for wide tables. HBase is It is thin and built for small tables. Hard to
horizontally scalable. scale
Applications of HBase
In HBase, tables are split into regions and are served by the region servers.
Regions are vertically divided by column families into “Stores”. Stores are
saved as files in HDFS
HBase has three major components: the client library, a master server, and
region servers. Region servers can be added or removed as per requirement
MasterServer
Assigns regions to the region servers and takes the help of Apache ZooKeeper
for this task.
Handles load balancing of the regions across region servers. It unloads the
busy servers and shifts the regions to less occupied servers.
Regions
Regions are nothing but tables that are split up and spread across the region
servers.
Region server
Communicate with the client and handle data-related operations.
Handle read and write requests for all the regions under it.
Zookeeper
Zookeeper is an open-source project that provides services like maintaining
configuration information, naming, providing distributed synchronization, etc.
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:/home/mca/HBase/HFiles</value>
</property>
<property> <name>hbase.zookeeper.property.dataDir</name>
<value>/home/mca/zookeeper</value> </property></configuration>
$cd /usr/local/HBase/bin
$./start-hbase.sh
Hbase>
The general commands in HBase are status, version, table_help, and whoami
Status-This command returns the status of the system including the details of the
servers running on the system
hbase> status
versin-This command returns the version of HBase used in your system
hbase> version
3. Listing a Table
TABLE
emp
2 row(s) in 0.0340 seconds
4. Create Data
Syntax
Output
5. Displaying data
Syntax
Scan ‘table name’
Example
ROW COLUMN+CELL
1 column=personal data:city, timestamp=1417524216501, value=hyderabad
1 column=personal data:name, timestamp=1417524185058, value=ramu
1 column=professional data:designation, timestamp=1417524232601,
value=manager
1 column=professional data:salary, timestamp=1417524244109, value=50000
Example-2
6. Delete data
7. Reading Data
Reading a all columns
Syntax
get ’<table name>
Example
hbase> get 'emp', '1'
Alter command
Changing the Maximum Number of Cells of a Column Family
hbase> alter 't1', NAME => 'f1', VERSIONS => 5
Example
hbase(main):003:0> alter 'emp', NAME => 'personal data', VERSIONS => 5
Adding column in table
hbase(main):003:0> alter 'emp', add pincode => 'personal data'
Deleting a Column Family
Syntax
hbase> alter ‘ table name ’, ‘delete’ => ‘ column family ’
Example
hbase(main):007:0> alter 'employee','delete'=>'professional'
9. Drop command
Drop command
Syntax
Drop ‘table name’
Example
hbase> drop 'emp'
drop_all command
Syntax
hbase> drop_all ‘reg_exp’
Example
hbase> drop_all ‘t.*’
What is Hadoop?
Apache Hadoop – free, open source ramework for data-intensive
applications
Inspired by Google technologies (MapReduce,
GFS)
Originally built to address scalability problems of Web search and analytics
Extensively used by Yahoo!
Enables applications to work with thousands of nodes and petabytes of
data in a highly parallel, cost effective manner
CPU + disks of a commodity box = Hadoop node
Boxes can be combined into clusters
New nodes can be added without changing
Data formats
Department of CA @ AVCCE Page 33 of 50
UNIT –V FRAMEWORK
• Tooling
– Installation/Configuration
– Development via Eclipse, jaql, AQL, DML, etc
– Discover/Analyze
– Browser Based Administration, Performance Analysis and etc
• Flex Scheduler
– Provided in addition to Hadoop’s FIFO and FAIR schedulers
–Optimize for response time, granular control, SLAs,
• Adaptive M/R
– Balance workload across mappers
– Minimize startup and scheduling cost
• LZO Split-able Compression
• Security Enhancements
– Secure access thru console, other open ports closed
– Authentication with LDAP
– Authorization thru roles
• Extensive RDBMS, DataWarehouse Integration.
• Integration with R & SPSS
• BigSheets Visualization
– Rapid analysis without M/R coding
Department of CA @ AVCCE Page 37 of 50
UNIT –V FRAMEWORK
Utility Operators
Easy to use
Stream comes with an eclipse based visual toolset, called Infosphere stream
studio which allows you to create,edit,test,debug ,run and even visualize a
stream graph model and SPL applications
Integration
Coordinating the traditional and new age big data processes takes a vendor
that understand both sides of the equation
7. IBM Big sheet –Visualization
BigSheets is a spreadsheet-style tool for business analysts provided with IBM
Info Sphere Big Insights, a platform based on the open source Apache Hadoop
project.
Big Sheets enables non-programmers to iteratively explore, manipulate, and
visualize data stored in your distributed file system
It includes built-in functions to extract names, addresses, organizations, email,
locations, and phone numbers.
Big Sheets has a good deal in common with a typical spreadsheet application,
such as Microsoft® Excel
The benefit is the ease with which you can open, browse, and ultimately create
graphical views in the form of graphs and tag clouds on your data. As an
entirely web-based interface to the data viewing and analysis process, it is very
easy to use, but not without its complexities.
Big Sheets either takes the data you provide and builds a visualized version of
that information in the form of a graph, or it processes raw information to
provide a summarized view of the data. This enables Big Sheets to support
some basic processing alongside its core visualization role
• Three steps involved in using big sheet to perform big data analysis
• Collect data-we can collect data from multiple sources, including crawling
web,local files,files on your network and multiple protocols format
(HTTP,HDFS,Amazon S3 Native file System)
• Extract and analyze data-we have collected your information, you can see a
sample of it in spreadsheet interface.
• Explore and visualize data-we can apply visualization to help you make sense
of your data .
Big Sheet provides the following visualization tools
• Tag cloud-shows word frequencies; the bigger the word, the more frequently it
exists in the sheet
• Pie chart-show proportional relationship, where the relative size of the slice
represents its proportion of the data
• Map-shows data values overlaid onto either a map of the world
• Heat map-The additional dimension of showing the relative intensity of the
values overlaid on the map
• Bar chart-shows the frequency of values for a specified columns
Visualization technique that suits the data format
• Basic crawler data— Information extracted from web pages using the
BigInsights web crawler.
• Character delimited data— Tab, tilde, or other field-based information
separated by characters.
• Character-delimited data with text qualifier— Delimited data where the
fields are enclosed by quotation marks or other characters to contain the data.
• Comma-separated value (CSV) data— Standard CSV format, including
qualified fields (quotation marks or escape characters), with or without header
rows.
• Hive read— Data read from an existing Hive data table.
• JSON array— An array of JSON data.
• JSON object read— Multiple lines of JSON object data.
• Line reader— Each line is taken as an entry of discrete data. This format is
useful when a separate job has output a list of unique words.
• Sheets data— Data generated by a previous sheet's processing job.
• Tab-separated value (TSV) data— Field-based information separated by
tabs, with or without a header row.
Department of CA @ AVCCE Page 46 of 50
UNIT –V FRAMEWORK
Visualization tools
Polymaps
Flot
Tangle
D3.js
FF Chartwell
CartoDB
The R Project
Case Study- BigSheets Twitter Analysis – Top tweetting users chart
oheader10 = user.description
o header11 = user.url
o header12 = user.followers_count
o header13 = user.friends_count
o header14 = retweet_count
o header15 = favorite_count
o header16 = lang
o header17 = text
• WordCount-IBM+BigData with 2 columns corresponding to:
o header1 = word
o header 2 = number of occurrences
1. From the BigSheets page of the web console, open the Tweets-IBM+BigData
ALL master workbook.
2. As I mentioned before, master workbooks cannot be modified so we need to
create a new workbook based on this master workbook. Within the master
workbook, click on Build new workbook.
3. New workbook will automatically open. In the left down corner click on Add
sheets and as a type of sheet choose Pivot.
5.Go to the Calculate tab and create a new column (for example “total”) showing the
results of function COUNT applied on groups based on column header16.
6. Apply the settings. You should get results similar to the following screenshot.
In the settings set Value to header16 and Count to total. Apply the settings.
Now you are supposed to run the computation of your chart. Note that a status bar to
the right of the Run button enables you to monitor the progress of job. Behind the
scenes, BigSheets executes Pig scripts that initiate MapReduce job. In the end you
should see the pie chart representing the coverage by languages. Note that every time
the data source change, you can just re-run your charts to get fresh results