Sei sulla pagina 1di 15

Hash Join Internals

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Functions
• A hash function maps arbitrary key values
to integer “hash values”
• Data is stored in an array based on the hash
values
• Data access is very efficient
• just compute the hash value of the key
and lookup the corresponding array element

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Buckets
• Too many possible hash values
• hash tables would be large and sparse
• Hash values are mapped to hash buckets
using the mod() function
• 100056732 % 100 = 32
• Each hash table element is a “bucket”
• Hash values that map to the same bucket are
stored on a “collision chain”
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Numeric keys
• Key data itself is the hash value
• Hash bucket is computed using MOD
• binary hashes are cheap to compute
• just a bitwise SHIFT operation
• prime number hashes randomize the
distribution of keys to hash buckets
• this prevents uneven or “skew” distributions

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Non-numeric Keys
• Internal hash function to get hash value
• examples:
• ‘Adams’  3016007180
• ‘Millsap’  1765538108
• DBMS_UTILITY.GET_HASH_VALUE()
• Binary MOD() used to map hash values to
hash buckets

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Key Value Hash Tables
Hash Function Hash Value

Hash Bucket Number

Hash Table
hash
bucket Bucket 1 Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8
headers

Hash Value Key


Hash Value Key
Hash Value Key
collision
Hash Value Key Hash Value Key
chains
Hash Value Key Hash Value Key
Hash Value Key

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Joins
• Concept
• read first row source and build a hash table
• read second row source and join via hash table
• applicable to (in)-equality joins & CBO only
• Approach
• first “partition” both inputs by hash value
• for corresponding pairs of partitions
• build an in-memory hash table from one input
• probe the hash table with rows from the other
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Table Partitioning
Hash Table Data

hash
bucket
bitmap Hash Bucket Bitmap

hash
table Partition 1 Partition 2 Partition 3 Partition 4
partition
buffers

saved
partition
extents

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Bit Vector Filtering
• When partitioning the first input
• build a bitmap of non-empty hash buckets
• When partitioning the second input
• check the hash bucket bitmap
• keys that map to empty hash buckets cannot be
joined
• for equality joins these rows can be immediately
excluded

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Partition Histogram
• Partition sizes will be uneven if the data
distribution is skew
• Partition histogram records for each partition pair
• number of keys
• bytes of memory required
• Allows dynamic role reversal
• for each partition, the hash table is built from the input
with the smaller memory requirement
• Allows optimum memory use when joining
multiple partitions simultaneously
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Join Processing
• Phases
• initialization (planning memory use)
• build input partitioning
• probe input partitioning
• may begin to return rows
• joining of saved partitions
• may require sub-partitioning of large partitions
• Hash area memory used differently in each
phase
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Build Input Partitioning
Hash Area
Join Key Non-Key Columns
Input Buffers

Partition Histogram

Hash Function

Hash Partition Number Partition Buffers

Hash Value Join Key Non-Key Columns

Hash Bucket Number Hash Bucket Bitmap

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Probe Input Partitioning
Hash Area

Input Buffers

Partition Histogram

Hash Table
Hash Function Output Rows

Partition Buffers

Hash Bucket Bitmap

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Joining Saved Partitions
Hash Area

Input Buffers

Partition Histogram

Hash Table
Output Rows

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Demonstration
• Event 10104 to show hash join internals

Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au

Potrebbero piacerti anche