Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Kashif Saeed
1
Complex Data Types in Hive
2
Complex data types in Hive/Impala
4
MAP
5
STRUCT
6
7
Hands-On
• Hive Activity 3
8
HiveQL
9
HiveQL
10
Computations in queries
11
LIMIT Clause
12
Column Aliases
13
CASE Statement
14
LIKE Operator
Source: http://www.sqlexamples.info/PHP/mysql_rlike.htm 16
RLIKE Operator
17
RLIKE Examples
19
Join Optimization – Number of Joins
22
Indexes in Hive
25
• WITH DEFERRED REBUILD
• If WITH DEFERRED REBUILD is specified on CREATE INDEX,
then the newly created index is initially empty (regardless
of whether the table contains any data)
• You can use the ALTER INDEX command to re-build the
index
• PARTITIONED BY
• If we omitted the PARTITIONED BY clause completely, the
index would span all partitions of the original table
26
Index - Examples
27
Index - Examples
28
Index - Examples
• Rebuilding an Index:
ALTER INDEX index_name ON table_name
[PARTITION (...)] REBUILD
• Dropping an Index:
DROP INDEX index_name ON table_name
29
HiveQL: Partitions in Hive
30
Partitioned Tables in Hive
31
Partitioned Tables in Hive - Continued
32
Partitioned Tables in Hive - Continued
33
Partitioned Tables in Hive
34
Hands-On
• Hive Activity 4
35
Sampling Data in Hive
36
Sampling Data in Hive
37
Sampling Data in Hive- Examples
38
Sampling Data in Hive - Examples
39
File Formats in Hive
40
Why Storage Formats are Important?
Storage formats:
• Text (txt, csv, tsv)
• Sequence File
• Avro
• Parquet
• Optimized Row Columnar (ORC)
42
Text Format
44
Avro
45
Columnar Formats (Parquet, ORC)
46
Handling JSON Data in Hive
47
Handling JSON Data in Hive
48
What is a SerDe?
50
JSON_TUPLE UDTF
• Hive Activity 6
• Practice – Complete the example in the blog:
http://thornydev.blogspot.com/2013/07/querying-
json-records-via-hive.html
• Expect multiple level JSON for the assignment, but not
for the exam
52
Appendix
53
Appendix A – Table Generating
Functions
55
• Table with multiple rows example
56
Appendix B - Database Indexes
Amazon may risk selling the last item to two customers and then apologize to one,
rather than interrupting the user experience to show the exact inventory.
ACID vs. BASE for Transactions
• Eric Brewer’s CAP theorem says that if you want
consistency, availability, and partition tolerance, you have
to settle for two out of three.
• Alternative to ACID is BASE
Basic Availability
Soft-state
Eventual consistency
• Rather than acquiring consistency after every transaction,
it will eventually be in a consistent state
• BASE allows for more scalable and affordable systems