Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Functions
• A hash function maps arbitrary key values
to integer “hash values”
• Data is stored in an array based on the hash
values
• Data access is very efficient
• just compute the hash value of the key
and lookup the corresponding array element
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Buckets
• Too many possible hash values
• hash tables would be large and sparse
• Hash values are mapped to hash buckets
using the mod() function
• 100056732 % 100 = 32
• Each hash table element is a “bucket”
• Hash values that map to the same bucket are
stored on a “collision chain”
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Numeric keys
• Key data itself is the hash value
• Hash bucket is computed using MOD
• binary hashes are cheap to compute
• just a bitwise SHIFT operation
• prime number hashes randomize the
distribution of keys to hash buckets
• this prevents uneven or “skew” distributions
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Non-numeric Keys
• Internal hash function to get hash value
• examples:
• ‘Adams’ 3016007180
• ‘Millsap’ 1765538108
• DBMS_UTILITY.GET_HASH_VALUE()
• Binary MOD() used to map hash values to
hash buckets
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Key Value Hash Tables
Hash Function Hash Value
Hash Table
hash
bucket Bucket 1 Bucket 2 Bucket 3 Bucket 4 Bucket 5 Bucket 6 Bucket 7 Bucket 8
headers
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Joins
• Concept
• read first row source and build a hash table
• read second row source and join via hash table
• applicable to (in)-equality joins & CBO only
• Approach
• first “partition” both inputs by hash value
• for corresponding pairs of partitions
• build an in-memory hash table from one input
• probe the hash table with rows from the other
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Table Partitioning
Hash Table Data
hash
bucket
bitmap Hash Bucket Bitmap
hash
table Partition 1 Partition 2 Partition 3 Partition 4
partition
buffers
saved
partition
extents
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Bit Vector Filtering
• When partitioning the first input
• build a bitmap of non-empty hash buckets
• When partitioning the second input
• check the hash bucket bitmap
• keys that map to empty hash buckets cannot be
joined
• for equality joins these rows can be immediately
excluded
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Partition Histogram
• Partition sizes will be uneven if the data
distribution is skew
• Partition histogram records for each partition pair
• number of keys
• bytes of memory required
• Allows dynamic role reversal
• for each partition, the hash table is built from the input
with the smaller memory requirement
• Allows optimum memory use when joining
multiple partitions simultaneously
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Hash Join Processing
• Phases
• initialization (planning memory use)
• build input partitioning
• probe input partitioning
• may begin to return rows
• joining of saved partitions
• may require sub-partitioning of large partitions
• Hash area memory used differently in each
phase
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Build Input Partitioning
Hash Area
Join Key Non-Key Columns
Input Buffers
Partition Histogram
Hash Function
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Probe Input Partitioning
Hash Area
Input Buffers
Partition Histogram
Hash Table
Hash Function Output Rows
Partition Buffers
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Joining Saved Partitions
Hash Area
Input Buffers
Partition Histogram
Hash Table
Output Rows
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au
Demonstration
• Event 10104 to show hash join internals
Unix + Oracle =
Steve Adams
Ixora steve.adams@ixora.com.au