Sei sulla pagina 1di 27

Hashing

Hashing
• Hashing is the transformation of a
string of characters into a usually
shorter fixed-length value or key that
represents the original string. Hashing
is used to index and retrieve items in a
database because it is faster to find the
item using the shorter hashed key than
to find it using the original value. It is
also used in many encryption
algorithms.
Hash Table
• Is a data structure that
associates keys with values

A small phone book as a hash table.


Hash Table (1)
• The primary operation it supports
efficiently is a lookup: given a key (a
person's name), find the corresponding
value (that person's telephone number). It
works by transforming the key using a
hash function into a hash, a number that
is used as an index in an array to locate
the desired location where the values
should be.
Hash Function
• The hashing algorithm
• is any well-defined procedure or
mathematical function which converts a
large, possibly variable-sized amount of
data into a small datum, usually a single
integer that may serve as an index into an
array. The values returned by a hash
function are called hash values, hash
codes, hash sums, or simply hashes.
Hash Function
1.Direct Hashing
The key is the address without any
algorith-
mic manipulation. The data structure must
therefore contain an element for every
possible key.
While the situations where you can use
direct hashing are limited, when it can be
used it is very powerful because it
guarantees that there are no synonyms.
001 Elmer
002 Markh

Address

005 Reymund
005 5
100 Hash 100
007 Hubert
002 Function 2

Key

100 Rollyn
2.Subtration Method
Sometimes we have keys that are
consecutive but do not start from one.
Example:
A company may have only 100
employees, but the employee numbers
start from 1000 and go to 1100. In this
case, we use a very simple hashing function
that subtracts 1000 from the key to
determine the address.
3.Digit Extraction
Selected digits are extracted from the key
and used as the address.
Example:
Using six-digit employee number to
hash to a three-digit address (000-999), we
could select the first, third, and fourth
digits.
379452 = 394
121267 = 112
378845 = 388
4.Mod division [001]
379452 Elmer

Divides the key by the [002]


121267 Markh
array size and uses [003]
the remainder + 1 [004]

[005]
121267 3 378845 Hubert
045128 Hash 307 [006]
379452 Function 1
[007]
.
.
. 160252 Arno
. 045128 Rollyn
.
[306]
[307]
5.Midsquare Hashing
The key is squared and the address
selected from the middle of the squared
number.
Example:
9452 * 9452 = 89340304 : address is
3403
As a variation, we can select a portion of
the key, and then use them rather than the
whole key.
379452 : 379 * 379 = 143641 : address
6.Folding Methods
There are two folding methods that are
used:
Fold Shift, the key value is divided into
parts whose size matches the size of the
required address. Then, the left and right
parts are shifted and added with the middle
part.
Fold Boundary, the left and right numbers
are folded on a fixed boundary between
them and the center number. This results
in a two outside values being reverse
Key
12345678 Digits
reversed
9
12 32
12 45
3 12 45
1 789
789
3 78
6 3 98
6
1 36
9 1 76
7
8 4 Digits
reversed

Discarded
Load Factor
Is the number of elements in the list
divided by the number of physical elements
allocated for the list expressed for a
percentage.
a = k / n x 100
Clustering
The tendency of data to build up unevenly
across a hashed list. It is usually created by
collisions.
Collision
Collision
Is the event that occurs when a hashing
algorithm produce an address for an
insertion key and that address is already
Home Address
occupied.
The address produced by hashing
Prime Area
algorithm.
The memory that contains all of the home
addresses.
Probe
Calculation of address and test for success.
B&A C&B
Collides Collides

C A B
[1] [5] [9] [17]

1. hash(A)
2. hash(B) 3. hash(C)
Collision Resolution
• The process of finding alternate location
• Collision strategy techniques:
– Separate chaining
– Open addressing
– Coalesced hashing
– Perfect hashing
– Dynamic perfect hashing
– Probabilistic hashing
– Robin hood hashing
– Cache-conscious collision resolution
Separate Chaining
• Sometimes called simply
chaining or direct chaining, in
its simplest form each slot in the
array is a linked list, or the
head cell of a linked list, where
the list contains the elements
that hashed to the same
location. Insertion requires
finding the correct slot, then
appending to either end of the
list in that slot
Open Addressing
• Open addressing hash tables store the records directly
within the array. This approach is also called closed
hashing. A hash collision is resolved by probing, or
searching through alternate locations in the array
(following a probe sequence) until either the target record
is found, or an unused array slot is found, which indicates
that there is no such key in the table.
Well Known Probe Sequences
379452 Elmer
[001]

[002] 070918 Redjie


121267 Markh
[003]

[004] 166702 Reymund


[005]
070918
Hash [006]
166702 Function [007]
378845 Hubert
.

Linear Probing .
.
Collision is resolved .
.
by adding one(1) to 160252 Arno
[306]
the current address 045128 Rollyn
[307]
Quadratic Probing
The increment is the collision probe number
squared.
Probe Collision Probe2 and New
Num Location Increment Address
1 1 12 = 1 1
2 2 22 = 4 3
3 6 32 = 9 5
4 15 42 = 16 7
5 31 52 = 25 9
6 56 62 = 36 11
Key Offset
Is a double hashing method that produces
different collision path for different keys.
Formula:
offset = (key / listsize)
adress = ((offset + old address) modulo
listsize) + 1
For example if the key is 166702 and the
listsize is 307, using the modulo division…
offset = (166702 / 307) = 543
address = ((543 + 002) modulo 307) + 1
= 239
379452 Elmer
[001]
070918 Redjie 166702 Reymund
[002]
121267 Markh
[003]
572556 Angelus
[004]

[005]

[006]
378845 Hubert
[007]
.
.
.
.
.
160252 Arno
[306]
045128 Rollyn
[307]
Hash collision resolved by linear probing (interval=1).

Potrebbero piacerti anche