Sei sulla pagina 1di 43
® Ben Knorr, Solutions Architect March 11, 2015 ® © 2014 MapR Technologies © 2014

®

Ben Knorr, Solutions Architect

March 11, 2015

® Ben Knorr, Solutions Architect March 11, 2015 ® © 2014 MapR Technologies © 2014 MapR
® Ben Knorr, Solutions Architect March 11, 2015 ® © 2014 MapR Technologies © 2014 MapR

®

© 2014 MapR Technologies © 2014 MapR Technologies

1

Topics

Motivation for Apache Drill Feature Walkthrough Architecture Overview Demo Q&A

Feature Walkthrough •   Architecture Overview •   Demo •   Q&A © 2014 MapR Technologies

© 2014 MapR Technologies

Feature Walkthrough •   Architecture Overview •   Demo •   Q&A © 2014 MapR Technologies

® 2

Motivation

Motivation © © 2014 2014 MapR MapR Technologies Technologies ® ® 3
Motivation © © 2014 2014 MapR MapR Technologies Technologies ® ® 3

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 3

Data is doubling in size every two years

Data is doubling in size every two years © 2014 MapR Technologies ® 4

© 2014 MapR Technologies

Data is doubling in size every two years © 2014 MapR Technologies ® 4

® 4

Total Data Stored Unstructured data will account for more than 80% of the data collected
Total Data Stored
Unstructured data will account
for more than 80% of the data
collected by organizations
SEMI-STRUCTURED
DATA
STRUCTURED DATA
1980
1990
2000
2010
2020
DATA STRUCTURED DATA 1980 1990 2000 2010 2020 Source: Human-Computer Interaction & Knowledge Discovery

Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data

© 2014 MapR Technologies

Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data © 2014 MapR Technologies ® 5

® 5

Data Increasingly Stored in Non-Relational Datastores

GBs-TBs

Structured

Planned (release cycle = months-years)

GBs-TBs Structured Planned (release cycle = months-years) RELATIONAL DATABASES Fixed schema DBA controls structure

RELATIONAL DATABASES

Planned (release cycle = months-years) RELATIONAL DATABASES Fixed schema DBA controls structure Volume Structure
Fixed schema DBA controls structure
Fixed schema
DBA controls structure
Volume
Volume
Structure
Structure
Development
Development

TBs-PBs

Structured, semi-structured and unstructured

Iterative (release cycle = days-weeks)

NON-RELATIONAL DATASTORES Dynamic / Flexible schema Application controls structure
NON-RELATIONAL DATASTORES
Dynamic / Flexible schema
Application controls structure
Database
Database
/ Flexible schema Application controls structure Database 1980 1990 2000 2010 2020 © 2014 MapR Technologies
/ Flexible schema Application controls structure Database 1980 1990 2000 2010 2020 © 2014 MapR Technologies
/ Flexible schema Application controls structure Database 1980 1990 2000 2010 2020 © 2014 MapR Technologies

1980

1990

2000

2010

2020

© 2014 MapR Technologies

/ Flexible schema Application controls structure Database 1980 1990 2000 2010 2020 © 2014 MapR Technologies

® 6

SQL in a Non-Relational World

SQL in a Non-Relational World •   Create and maintain schemas on: – HDFS (Parquet, JSON,
SQL in a Non-Relational World •   Create and maintain schemas on: – HDFS (Parquet, JSON,
•   Create and maintain schemas on: – HDFS (Parquet, JSON, etc.) HBase –  
Create and maintain schemas on:
HDFS (Parquet, JSON, etc.)
HBase
–   …
•  
Transform or copy data
–   …   –   •   Transform or copy data WANT •   •

WANT

•   •   •   •   SQL BI (Tableau, MicroStrategy, etc.) Low latency
•  
SQL
BI (Tableau, MicroStrategy, etc.)
Low latency
Scalability

© 2014 MapR Technologies

  •   •   SQL BI (Tableau, MicroStrategy, etc.) Low latency Scalability © 2014 MapR

® 7

Big data makes schema management HARD

X New data models/data types don’t map to relational paradigms X Centralized schemas do not
X
New data models/data types don’t map to relational paradigms
X
Centralized schemas do not always work
types don’t map to relational paradigms X Centralized schemas do not always work © 2014 MapR

© 2014 MapR Technologies

types don’t map to relational paradigms X Centralized schemas do not always work © 2014 MapR

® 8

APACHE DRILL
APACHE DRILL

Schema-free data exploration for Hadoop and NoSQL • Point-and-query vs. schema-first • Low latency SQL queries at Scale • Extreme Ease of Use • Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs

  Extreme Ease of Use •   Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs © 2014

© 2014 MapR Technologies

  Extreme Ease of Use •   Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs © 2014

® 9

Agility By Reducing Distance To Data

Traditional approaches

Apache Drill

Distance To Data Traditional approaches Apache Drill Time to insight : Weeks to months Data Hadoop
Time to insight : Weeks to months Data Hadoop data Data modeling Transformation Users movement
Time to insight : Weeks to months
Data
Hadoop data
Data modeling
Transformation
Users
movement

Source data evolution

Time to insight: Minutes

movement Source data evolution Time to insight: Minutes Hadoop data Users New Business questions © 2014
Hadoop data Users
Hadoop data
Users

New Business questions

© 2014 MapR Technologies

Source data evolution Time to insight: Minutes Hadoop data Users New Business questions © 2014 MapR

® 10

Evolution Towards Self-Service Data Exploration

Data Modeling and Transformation
Data Modeling and
Transformation
Data Visualization
Data Visualization
Traditional BI w/ RDBMS Self-Service BI w/ RDBMS SQL-on-Hadoop Self-Service Data Exploration
Traditional BI
w/ RDBMS
Self-Service BI
w/ RDBMS
SQL-on-Hadoop
Self-Service
Data Exploration

IT-driven

IT-driven

IT-driven

IT-driven

Self-service

Self-service

Optional Self-service
Optional
Self-service

Zero-day analytics

© 2014 MapR Technologies

IT-driven Self-service Self-service Optional Self-service Zero-day analytics © 2014 MapR Technologies ® 11

® 11

How Drill achieves data agility © © 2014 2014 MapR MapR Technologies Technologies ® ®

How Drill achieves data agility

How Drill achieves data agility © © 2014 2014 MapR MapR Technologies Technologies ® ® 12

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 12

Drill’s Data Model is Flexible

Schema-less
Schema-less

Fixed schema

CSV HBase TSV Flexibility Parquet JSON Avro BSON
CSV
HBase
TSV
Flexibility
Parquet
JSON
Avro
BSON

Flexibility

Flat

Complex

Parquet JSON Avro BSON Flexibility Flat Complex RDBMS/SQL-on- Hadoop table Name ! Gender ! Age !

RDBMS/SQL-on-Hadoop table

Name !

Gender !

Age !

Michael !

M

6

Jennifer !

F

3

Apache Drill table

{

 

name: { ! first: Michael, ! last: Smith !

}, ! hobbies: [ski, soccer], ! district: Los Altos !

 

}

{

 

name: { ! first: Jennifer, ! last: Gates !

}, ! hobbies: [sing], ! preschool: CCLC !

}

© 2014 MapR Technologies

 

®

13

Drill Supports Schema Discovery On-The-Fly

Schema Declared In Advance

•   •   Fixed schema Leverage schema in centralized repository (Hive Metastore)
•  
•  
Fixed schema
Leverage schema in centralized
repository (Hive Metastore)
Leverage schema in centralized repository (Hive Metastore) Schema Discovered On-The-Fly 2 •   •   Fixed

Schema Discovered On-The-Fly

2 •   •   Fixed schema, evolving schema or schema-less Leverage schema in centralized
2
•  
•  
Fixed schema, evolving schema or
schema-less
Leverage schema in centralized
repository or self-describing data

SCHEMA ON WRITE

SCHEMA ON WRITE

SCHEMA BEFORE READ

SCHEMA ON THE FLY

repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014
repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014
repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014
repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014
repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014
repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014

© 2014 MapR Technologies

repository or self-describing data SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY © 2014

® 14

Drill’s Role in the Enterprise Data Architecture

Drill’s Role in the Enterprise Data Architecture Raw data •   JSON, CSV, “Optimized” data •
Drill’s Role in the Enterprise Data Architecture Raw data •   JSON, CSV, “Optimized” data •
Drill’s Role in the Enterprise Data Architecture Raw data •   JSON, CSV, “Optimized” data •
Drill’s Role in the Enterprise Data Architecture Raw data •   JSON, CSV, “Optimized” data •

Raw data

JSON, CSV,

“Optimized” data

Parquet,

Exploration (known and unknown questions)

Centrally-structured

data

Schemas in Hive Metastore

Relational data

Highly-structured data

Relational data •   Highly-structured data Oracle, Teradata Hive, Impala, Spark SQL © 2014 MapR

Oracle, Teradata

data •   Highly-structured data Oracle, Teradata Hive, Impala, Spark SQL © 2014 MapR Technologies ®

Hive, Impala, Spark SQL

Use cases

Apache Drill Enterprise users Ad-hoc query/ BI reporting Data exploration / Raw data analysis Instant
Apache Drill
Enterprise users
Ad-hoc query/ BI
reporting
Data exploration /
Raw data analysis
Instant /
“Day 0” queries

Data entering Hadoop

Enterprise users
Enterprise users

Ad-hoc query/BI reporting

Process data using Hive/Pig/Mapreduce Move to traditional systems DW platforms Hadoop (e.g., Oracle, Teradata)
Process data using
Hive/Pig/Mapreduce
Move to traditional systems
DW platforms
Hadoop
(e.g., Oracle, Teradata)
Move to traditional systems DW platforms Hadoop (e.g., Oracle, Teradata) © 2014 MapR Technologies ® 16
Move to traditional systems DW platforms Hadoop (e.g., Oracle, Teradata) © 2014 MapR Technologies ® 16
Move to traditional systems DW platforms Hadoop (e.g., Oracle, Teradata) © 2014 MapR Technologies ® 16

© 2014 MapR Technologies

Move to traditional systems DW platforms Hadoop (e.g., Oracle, Teradata) © 2014 MapR Technologies ® 16

® 16

Feature Walkthrough

Feature Walkthrough © © 2014 2014 MapR MapR Technologies Technologies ® ® 17
Feature Walkthrough © © 2014 2014 MapR MapR Technologies Technologies ® ® 17

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 17

Business dataset

Business dataset { " business_id": "4bEjOyTaDG24SY5TxsaUNQ", " full_address": "3655 Las Vegas

{

"business_id": "4bEjOyTaDG24SY5TxsaUNQ", "full_address": "3655 Las Vegas Blvd S\nThe Strip\nLas Vegas, NV 89109", "hours": { "Monday": {"close": "23:00", "open": "07:00"}, "Tuesday": {"close": "23:00", "open": "07:00"}, "Friday": {"close": "00:00", "open": "07:00"}, "Wednesday": {"close": "23:00", "open": "07:00"}, "Thursday": {"close": "23:00", "open": "07:00"}, "Sunday": {"close": "23:00", "open": "07:00"}, "Saturday": {"close": "00:00", "open": "07:00"}

}, "open": true, "categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"], "city": "Las Vegas", "review_count": 4084, "name": "Mon Ami Gabi", "neighborhoods": ["The Strip"], "longitude": -115.172588519464, "state": "NV", "stars": 4.0, "attributes": { "Alcohol": "full_bar”, "Noise Level": "average", "Has TV": false, "Attire": "casual", "Ambience": { "romantic": true, "intimate": false, "touristy": false, "hipster": false, "classy": true, "trendy": false, "casual": false

}, "Good For": {"dessert": false, " latenight": false, "lunch": false, "dinner": true, "breakfast": false, "brunch": false},

}

© 2014 MapR Technologies

false, "dinner": true, "breakfast": false, "brunch": false}, } © 2014 MapR Technologies ® 18 }

® 18

false, "dinner": true, "breakfast": false, "brunch": false}, } © 2014 MapR Technologies ® 18 }
false, "dinner": true, "breakfast": false, "brunch": false}, } © 2014 MapR Technologies ® 18 }
false, "dinner": true, "breakfast": false, "brunch": false}, } © 2014 MapR Technologies ® 18 }

}

Reviews dataset

Reviews dataset {   "votes": {"funny": 0, "useful": 2, "cool": 1},
Reviews dataset {   "votes": {"funny": 0, "useful": 2, "cool": 1},

{

 

"votes": {"funny": 0, "useful": 2, "cool": 1}, "user_id": "Xqd0DzHaiyRqVH3WRG7hzg", "review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17",

"text": "dr. goldberg offers everything "type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA"

",

}

"review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA" ", } © 2014 MapR Technologies ® 19
"review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA" ", } © 2014 MapR Technologies ® 19

© 2014 MapR Technologies

® 19

Zero to Results in 2 minutes

Install

$ tar -xvzf apache-drill-0.7.0.tar.gz

Install $ tar -xvzf apache-drill-0.7.0.tar.gz Launch shell (embedded mode) $ bin/sqlline -u jdbc:drill:zk=local Query

Launch shell (embedded mode)

$ bin/sqlline -u jdbc:drill:zk=local

shell (embedded mode) $ bin/sqlline -u jdbc:drill:zk=local Query files and directories > SELECT state, city,

Query files and directories

> SELECT state, city, count(*) AS businesses FROM dfs.yelp.`business.json`

city, count(*) AS businesses FROM dfs.yelp.`business.json` GROUP BY state, city ORDER BY businesses DESC LIMIT 10;

GROUP BY state, city ORDER BY businesses DESC LIMIT 10;

+------------+------------+-------------+

|

state

|

city

|

businesses |

+------------+------------+-------------+

| NV

| Las Vegas

| 12021

|

| AZ

| Phoenix

| 7499

|

| AZ

| Scottsdale | 3605

|

| EDH

| Edinburgh

| 2804

|

| AZ

| Mesa

| 2041

|

| AZ

| Tempe

| 2025

|

| NV

| Henderson

| 1914

|

| AZ

| Chandler

| 1637

|

| WI

| Madison

| 1630

|

| AZ

| Glendale

| 1196

|

+------------+------------+-------------+

| WI | Madison | 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+ Results

Results

| 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+ Results ® 20
| 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+ Results ® 20

® 20

Drill enables ‘SQL on Everything’

SELECT * FROM dfs.yelp.`business.json` !

on Everything’ SELECT * FROM dfs.yelp.`business.json` ! A workspace -   Sub-directory -   HBase namespace

A workspace - Sub-directory - HBase namespace - Hive database

A storage plugin instance - DFS (Text, Parquet, JSON) - HBase/MapRDB - Hive Metastore/Hcatalog

- Easy API to go beyond Hadoop

A table

- pathnames - Hive table - HBase table

to go beyond Hadoop A table -   pathnames -   Hive table -   HBase

© 2014 MapR Technologies

to go beyond Hadoop A table -   pathnames -   Hive table -   HBase

® 21

Intuitive SQL access to complex data

// It’s Friday 10pm in Vegas and looking for Hummus

> SELECT name, stars, b.hours.Friday friday, categories FROM dfs.yelp.`business.json` b WHERE b.hours.Friday.`open` < '22:00' AND b.hours.Friday.`close` > '22:00' AND REPEATED_CONTAINS(categories, 'Mediterranean') AND city = 'Las Vegas' ORDER BY stars DESC LIMIT 2;

AND city = 'Las Vegas' ORDER BY stars DESC LIMIT 2; | categories | Query data

| categories |

Query data with any levels of nesting

+------------+------------+------------+------------+

|

+------------+------------+------------+------------+

|

name

|

stars

friday

| Olives

| Marrakech Moroccan Restaurant | 4.0

| 4.0

| {"close":"22:30","open":"11:00"} | [" Mediterranean","Restaurants"] |

| {"close":"23:00","open":"17:30"} |

["Mediterranean","Middle Eastern","Moroccan","Restaurants "] |

+------------+------------+------------+------------+

"] | +------------+------------+------------+------------+ © 2014 MapR Technologies ® 22

© 2014 MapR Technologies

"] | +------------+------------+------------+------------+ © 2014 MapR Technologies ® 22

® 22

ANSI SQL compatibility

//Get top cool rated businesses

Ø SELECT b.name from dfs.yelp.`business.json` b WHERE b.business_id IN (SELECT r.business_id FROM dfs.yelp.`review.json` r GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY SUM(r.votes.cool) DESC);

+------------+

name

+------------+

|

|

| Earl of Sandwich |

| XS Nightclub |

| The Cosmopolitan of Las Vegas |

| | XS Nightclub | | The Cosmopolitan of Las Vegas | | Wicked Spoon |

| Wicked Spoon | +------------+

Use familiar SQL functionality (Joins, Aggregations, Sorting, Sub- queries, SQL data types)

familiar SQL functionality (Joins, Aggregations, Sorting, Sub- queries, SQL data types) © 2014 MapR Technologies ®

© 2014 MapR Technologies

familiar SQL functionality (Joins, Aggregations, Sorting, Sub- queries, SQL data types) © 2014 MapR Technologies ®

® 23

Logical views

//Create a view combining business and reviews datasets

> CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS SELECT b.name, b.stars, r.votes.funny, r.votes.useful, r.votes.cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id;

r WHERE r.business_id = b.business_id; +------------+------------+ | ok | summary |

+------------+------------+

|

ok

| summary

|

+------------+------------+

| true

| View 'BusinessReviews' created successfully in 'dfs.tmp' schema |

+------------+------------+

> SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews;

+------------+

|

Total

|

+------------+

|

1125458

|

+------------+

Lightweight file system based views for granular and de- centralized data management

Lightweight file system based views for granular and de- centralized data management © 2014 MapR Technologies

© 2014 MapR Technologies

Lightweight file system based views for granular and de- centralized data management © 2014 MapR Technologies

® 24

Materialized Views AKA Tables

> ALTER SESSION SET `store.format` = 'parquet';

> CREATE TABLE dfs.yelp.BusinessReviewsTbl AS SELECT b.name, b.stars, r.votes.funny funny, r.votes.useful useful, r.votes.cool cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id;

r WHERE r.business_id = b.business_id; +------------+---------------------------+ | Fragment |

+------------+---------------------------+

| Fragment | Number of records written |

+------------+---------------------------+

| 1_0

| 176448

|

| 1_1

| 192439

|

| 1_2

| 198625

|

| 1_3

| 200863

|

| 1_4

| 181420

|

| 1_5

| 175663

|

+------------+---------------------------

+

Save analysis results as tables using familiar CTAS syntax

+ Save analysis results as tables using familiar CTAS syntax © 2014 MapR Technologies ® 25

© 2014 MapR Technologies

+ Save analysis results as tables using familiar CTAS syntax © 2014 MapR Technologies ® 25

® 25

Working with repeated values © © 2014 2014 MapR MapR Technologies Technologies ® ® 26

Working with repeated values

Working with repeated values © © 2014 2014 MapR MapR Technologies Technologies ® ® 26

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 26

Extensions to ANSI SQL to work with repeated values

// Flatten repeated categories > SELECT name, categories FROM dfs.yelp.`business.json` LIMIT 3;
// Flatten repeated categories
> SELECT name, categories
FROM dfs.yelp.`business.json` LIMIT 3;
+------------+------------+
|
name
| categories |
+------------+------------+
| Eric Goldberg, MD | ["Doctors","Health & Medical"] |
| Pine Cone Restaurant | ["Restaurants"] |
Deforest Family Restaurant | ["American (Traditional)","Restaurants"] |
+------------+------------+
|
> SELECT name, FLATTEN(categories) AS categories
FROM dfs.yelp.`business.json` LIMIT 5;
+------------+------------+
|
name
| categories |
+------------+------------+
| Eric Goldberg, MD | Doctors
|
| Eric Goldberg, MD | Health & Medical |
| Pine Cone Restaurant | Restaurants |
| Deforest Family Restaurant | American (Traditional) |
Dynamically
flatten repeated
and nested data
elements as part
of SQL queries.
No ETL necessary
Deforest Family Restaurant | Restaurants |
+------------+------------+
|
ETL necessary Deforest Family Restaurant | Restaurants | +------------+------------+ | © 2014 MapR Technologies ® 27

© 2014 MapR Technologies

ETL necessary Deforest Family Restaurant | Restaurants | +------------+------------+ | © 2014 MapR Technologies ® 27

® 27

Extensions to ANSI SQL to work with repeated values

// Get most common business categories

>SELECT category, count(*) AS categorycount FROM (SELECT name, FLATTEN(categories) AS category FROM dfs.yelp.`business.json`) c GROUP BY category ORDER BY categorycount DESC;

+------------+------------+

| category | categorycount| +------------+------------+

|

Restaurants | 14303

 

|

|

Australian | 1

 

|

|

Boat Dealers | 1

 

|

|

Firewood

|

1

|

+------------+------------+

| 1   | | Firewood | 1 | +------------+------------+ © 2014 MapR Technologies ® 28

© 2014 MapR Technologies

| 1   | | Firewood | 1 | +------------+------------+ © 2014 MapR Technologies ® 28

® 28

Working with dynamic columns © © 2014 2014 MapR MapR Technologies Technologies ® ® 29

Working with dynamic columns

Working with dynamic columns © © 2014 2014 MapR MapR Technologies Technologies ® ® 29

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 29

Check ins dataset { "checkin_info":{ "3-4":1, "13-5":1, "6-6":1, "14-5":1,
Check ins dataset
{
"checkin_info":{
"3-4":1,
"13-5":1,
"6-6":1,
"14-5":1,
"14-6":1,
"14-2":1,
"14-3":1,
"19-0":1,
"11-5":1,
"13-2":1,
"11-6":2,
"11-3":1,
"12-6":1,
"6-5":1,
"5-5":1,
"9-2":1,
"9-5":1,
"9-6":1,
"5-2":1,
"7-6":1,
"7-5":1,
"7-4":1,
"17-5":1,
"8-5":1,
"7-4":1, "17-5":1, "8-5":1, "10-2" :1, "10-5" :1, "10-6" :1 },

"10-2":1,

"10-5":1,

"10-6":1

}, "type":"checkin",

"business_id":"JwUE5GmEO-sH1FuwJgKBlQ"

}

© 2014 MapR Technologies

:"checkin", "business_id" :"JwUE5GmEO-sH1FuwJgKBlQ" } © 2014 MapR Technologies ® 30

® 30

Makes it easy to work with dynamic/unknown columns

> jdbc:drill:zk=local> SELECT KVGEN(checkin_info) checkins FROM dfs.yelp.`checkin.json` LIMIT 1;

+------------+

| checkins |

+------------+

LIMIT 1; +------------+ | checkins | +------------+ |

|

[{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14-5","value":1},{"key":"14-6","value":

1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19-0","value":1},{"key":"11-5","value":1},

{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11-3","value":1},{"key":"12-6","value":1},{"key":"6-5","value":1},

{"key":"5-5","value":1},{"key":"9-2","value":1},{"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},

{"key":"7-6","value":1},{"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8-5","value":1},

{"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] | +------------+

jdbc:drill:zk=local> SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.`checkin.json` limit 6;

>

+------------+

| checkins |

+------------+

Convert Map with a wide set of dynamic columns into an array of key-value pairs

| {"key":"3-4","value":1} |

| {"key":"13-5","value":1} |

| {"key":"6-6","value":1} |

| {"key":"14-5","value":1} |

| {"key":"14-6","value":1} |

| {"key":"14-2","value":1} | +------------+

| | {"key":"14-2","value":1} | +------------+ © 2014 MapR Technologies ® 31

© 2014 MapR Technologies

| | {"key":"14-2","value":1} | +------------+ © 2014 MapR Technologies ® 31

® 31

Makes it easy to work with dynamic/unknown columns

// Count total number of checkins on Sunday midnight

Ø jdbc:drill:zk=local> SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM (SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.checkin.json`) checkintbl WHERE checkintbl.checkins.key='23-0';

+------------------------+

| SundayMidnightCheckins |

+------------------------+

| 8575

|

+------------------------+

| +------------------------+ | 8575 | +------------------------+ © 2014 MapR Technologies ® 32

© 2014 MapR Technologies

| +------------------------+ | 8575 | +------------------------+ © 2014 MapR Technologies ® 32

® 32

Leverage Existing SQL Tools and Skills

Leverage Existing SQL Tools and Skills Leverage SQL-compatible tools (BI, query builders, etc.) via Drill’s standard

Leverage SQL-compatible tools (BI, query builders, etc.) via Drill’s standard ODBC, JDBC and ANSI SQL support

Enable business analysts, technical analysts and data scientists to explore and analyze large volumes of real-time data

analysts and data scientists to explore and analyze large volumes of real-time data © 2014 MapR
analysts and data scientists to explore and analyze large volumes of real-time data © 2014 MapR

© 2014 MapR Technologies

analysts and data scientists to explore and analyze large volumes of real-time data © 2014 MapR

® 33

Architecture Overview

Architecture Overview © © 2014 2014 MapR MapR Technologies Technologies ® ® 34
Architecture Overview © © 2014 2014 MapR MapR Technologies Technologies ® ® 34

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 34

High Level Architecture

Cluster of commodity servers

Daemon (drillbit) on each node

ZooKeeper maintains ephemeral cluster membership information

 

Drillbit uses ZooKeeper to find other drillbits in the cluster

 

Client uses ZooKeeper to find drillbits

Built-in, optimistic query execution engine. Doesn’t require a particular storage or execution system (MapReduce, Spark, Tez)

Better performance and manageability

Data processing unit is columnar record batches

Data processing unit is columnar record batches •   –   Enables schema flexibility with negligible

Enables schema flexibility with negligible performance impact

Designed for Extensibility at all layers

© 2014 MapR Technologies

flexibility with negligible performance impact Designed for Extensibility at all layers © 2014 MapR Technologies ®

® 35

Basic Process

Query

1. Query comes to any Drillbit (JDBC, ODBC, CLI, REST) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes

4. Result is returned to driving node

Drillbit

Drillbit

4. Result is returned to driving node Drillbit Drillbit Drillbit DFS/HBase/ Hive DFS/HBase/ Hive DFS/HBase/

Drillbit

is returned to driving node Drillbit Drillbit Drillbit DFS/HBase/ Hive DFS/HBase/ Hive DFS/HBase/ Hive

DFS/HBase/

Hive

DFS/HBase/

Hive

DFS/HBase/

Hive

Zookeeper

Drillbit Drillbit DFS/HBase/ Hive DFS/HBase/ Hive DFS/HBase/ Hive Zookeeper © 2014 MapR Technologies ® 36

© 2014 MapR Technologies

Drillbit Drillbit DFS/HBase/ Hive DFS/HBase/ Hive DFS/HBase/ Hive Zookeeper © 2014 MapR Technologies ® 36

® 36

Core Modules within drillbit

RPC Endpoint

SQL Parser

Core Modules within drillbit RPC Endpoint SQL Parser Logical Plan Optimizer Physical Plan Execution Storage Plugins

Logical Plan

Optimizer

Physical Plan

Execution

Storage Plugins

DFS

Hive

HBase

MongoDB

© 2014 MapR Technologies

Logical Plan Optimizer Physical Plan Execution Storage Plugins DFS Hive HBase MongoDB © 2014 MapR Technologies

® 37

Demo

Demo © © 2014 2014 MapR MapR Technologies Technologies ® ® 38
Demo © © 2014 2014 MapR MapR Technologies Technologies ® ® 38

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 38

Wrap Up

Wrap Up © © 2014 2014 MapR MapR Technologies Technologies ® ® 39
Wrap Up © © 2014 2014 MapR MapR Technologies Technologies ® ® 39

© © 2014 2014 MapR MapR Technologies Technologies

®
®

® 39

Apache Drill

Apache Drill Industry’s first Schema-free SQL query for Hadoop /NoSQL Self Service Data Exploration Single SQL

Industry’s first Schema-free SQL query for Hadoop/NoSQL

Industry’s first Schema-free SQL query for Hadoop /NoSQL Self Service Data Exploration Single SQL Interface for

Self Service Data Exploration

SQL query for Hadoop /NoSQL Self Service Data Exploration Single SQL Interface for Structured and Semi-

Single SQL Interface for Structured and Semi- Structured Data

SQL Interface for Structured and Semi- Structured Data Data Agility with No IT Intervention © 2014

Data Agility with No IT Intervention

© 2014 MapR Technologies

for Structured and Semi- Structured Data Data Agility with No IT Intervention © 2014 MapR Technologies

® 40

Next Steps for Drill

Security & Access Control

Next Steps for Drill Security & Access Control 1 JSON in Any Shape or Form 2

1

JSON in Any Shape or Form

2

New Data Sources

4

Improved

Multi

Tenancy

3

© 2014 MapR Technologies

Control 1 JSON in Any Shape or Form 2 New Data Sources 4 Improved Multi Tenancy

® 41

$50M $50M in Free Training
$50M $50M
in Free Training

© 2014 MapR Technologies

$50M $50M in Free Training © 2014 MapR Technologies ® 42

® 42

For More Information

Learn:

http://drill.apache.org https://www.mapr.com/products/apache-drill

Download MapR Sandbox

https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

Ask questions:

user@drill.apache.org

email:

–   user@drill.apache.org •   email: –   bknorr@mapr.com © 2014 MapR Technologies ® 43

bknorr@mapr.com

© 2014 MapR Technologies

–   user@drill.apache.org •   email: –   bknorr@mapr.com © 2014 MapR Technologies ® 43

® 43