Sei sulla pagina 1di 3

We Do Hadoop

Contents

Cheat Sheet 1 Additional Resources

2
Hive for SQL Users 3
Query, Metadata
Current SQL Compatibility, Command Line, Hive Shell

If youre already a SQL user then working with Hadoop may be a little easier than you think, thanks to
Apache Hive. Apache Hive is data warehouse infrastructure built on top of Apache Hadoop for providing
data summarization, ad hoc query, and analysis of large datasets. It provides a mechanism to project
structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).

Use this handy cheat sheet (based on this original MySQL cheat sheet) to get going with Hive and Hadoop.

Additional Resources

Learn to become fluent in Apache Hive with the Hive Language Manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual

Get in the Hortonworks Sandbox and try out Hadoop with interactive tutorials:
http://hortonworks.com/sandbox

Register today for Apache Hadoop Training and Certification at Hortonworks University:
http://hortonworks.com/training

Twitter: twitter.com/hortonworks
Facebook: facebook.com/hortonworks
International: 1.408.916.4121
www.hortonworks.com

We Do Hadoop

Query
Function MySQL HiveQL

Retrieving information SELECTfrom_columnsFROMtableWHEREconditions; SELECTfrom_columnsFROMtableWHEREconditions;

All values SELECT*FROMtable; SELECT*FROMtable;

Some values SELECT*FROMtableWHERErec_name=value; SELECT*FROMtableWHERErec_name="value";



Multiple criteria SELECT*FROMtableWHERErec1=value1AND SELECT*FROMTABLEWHERErec1="value1"AND
rec2=value2; rec2="value2";

Selecting specific columns SELECTcolumn_nameFROMtable; SELECTcolumn_nameFROMtable;

Retrieving unique output records SELECTDISTINCTcolumn_nameFROMtable; SELECTDISTINCTcolumn_nameFROMtable;

Sorting SELECTcol1,col2FROMtableORDERBYcol2; SELECTcol1,col2FROMtableORDERBYcol2;

Sorting backward SELECTcol1,col2FROMtableORDERBYcol2DESC; SELECTcol1,col2FROMtableORDERBYcol2DESC;

Counting rows SELECTCOUNT(*)FROMtable; SELECTCOUNT(*)FROMtable;

Grouping with counting SELECTowner,COUNT(*)FROMtableGROUPBY SELECTowner,COUNT(*)FROMtableGROUPBY
owner; owner;

Maximum value SELECTMAX(col_name)ASlabelFROMtable; SELECTMAX(col_name)ASlabelFROMtable;

Selecting from multiple tables SELECTpet.name,commentFROMpet,eventWHERE SELECTpet.name,commentFROMpetJOINeventON
(Join same table using alias pet.name=event.name; (pet.name=event.name);
w/AS)

Metadata

Function MySQL HiveQL

Selecting a database USEdatabase; USEdatabase;

Listing databases SHOWDATABASES; SHOWDATABASES;

Listing tables in a database SHOWTABLES; SHOWTABLES;

Describing the format of a table DESCRIBEtable; DESCRIBE(FORMATTED|EXTENDED)table;

Creating a database CREATEDATABASEdb_name; CREATEDATABASEdb_name;

Dropping a database DROPDATABASEdb_name; DROPDATABASEdb_name(CASCADE);

Twitter: twitter.com/hortonworks
Facebook: facebook.com/hortonworks
International: 1.408.916.4121
www.hortonworks.com

We Do Hadoop

Current SQL Compatibility

Hive SQL Datatypes Hive SQL Semantics Color Key


INT SELECT,LOADINSERTfromquery Hive0.10

TINYINT/SMALLINT/BIGINT ExpressionsinWHEREandHAVING Hive0.11

BOOLEAN GROUPBY,ORDERBY,SORTBY FUTURE

FLOAT SubqueriesinFROMclause
DOUBLE GROUPBY,ORDERBY

STRING CLUSTERBY,DISTRIBUTEBY

TIMESTAMP ROLLUPandCUBE

BINARY UNION

ARRAY,MAP,STRUCT,UNION LEFT,RIGHTandFULLINNER/OUTERJOIN

DECIMAL CROSSJOIN,LEFTSEMIJOIN

CHAR Windowingfunctions(OVER,RANK,etc)

CARCHAR INTERSECT,EXCEPT,UNION,DISTINCT

SubqueriesinWHERE(IN,NOTIN,EXISTS/
DATE
NOTEXISTS)
SubqueriesinHAVING

Command Line
Function Hive

Run query hivee'selecta.colfromtab1a'

Run query silent mode hiveSe'selecta.colfromtab1a'

Set hive config variables hivee'selecta.colfromtab1a'hiveconfhive.root.logger=DEBUG,console

Use initialization script hiveiinitialize.sql

Run non-interactive script hivefscript.sql

Hive Shell
Function Hive

Run script inside shell sourcefile_name

Run ls (dfs) commands dfsls/user

Run ls (bash command) from shell !ls

Set configuration variables setmapred.reduce.tasks=32

TAB auto completion sethive.<TAB>

Show all variables starting with hive set

Revert all variables reset

Add jar to distributed cache addjarjar_path

Show all jars in distributed cache listjars

Delete jar from distributed cache deletejarjar_name

Twitter: twitter.com/hortonworks
Facebook: facebook.com/hortonworks
International: 1.408.916.4121
www.hortonworks.com

Potrebbero piacerti anche