Hashing and File Structure

V The best search method introduced so far is the
binary search technique which has search time

proportional to log2n.
V A hash search is a search in which the key,

through an algorithmic function determines
location of the data.
V A function that transforms a key into table
index is called a hash function.
V If h is hash function and KEY is the key then

h(KEY) is called the hash of key.
V If r is record and h is hash function then hr is

called the hash key of r.
!

23 A
2
. . .
. . .
. . .

V If we have a table of n employee records, each
identified by an employee number key whose
value lies between to n, the key value locates
the employee record.
V A good hashing function avoids collision.
V A good hash function spread keys in arrays.
V A good hash function is easy to compute.

V ðirect method
V Subtraction method
V Modulo division method
V ðigit Extraction method
V Mid square method
V Folding method
V Rotation method
V Pseudo random method
V The key is address without any algorithmic
manipulation.
V Example: A small organization has fewer than
employees. Each employee assigned an
employee number from to .Here if we
create array of employee record, the
employee number can be directly used as
address of individual record
V It can only be used for fewer records.
V Sometimes we have consecutive keys but not
starting from .
V For example a company may have only
employees but the employee number starting
from to .Here very simple hashing
function that subtracts from the key to
determine the address.
V It can be used only for fewer records.
V Also known as division remainder method, it
divides the key by the array size and uses the
remainder plus for address.
V For example: address=key modulo listsize+
et key=23, listsize= then

address= 23% +
=3+
= .
Means key whose value is 23 is placed at address
.
V Vsing digit extraction, selected digits are
extracted from key and used as address
V Example: let employee number be 23 then

we can select st, 2nd and 3rd digit as the
address. So address will be 23.
V In, mid square hashing, the key is squared and
address is selected from the middle of the
squared number.
V The main limitation of this method is the size
of key, for a 6 digit its square is of 2 digits
which exceeds the integer size.
V For example: =22 so mid square address
is 2.
V The key value is divided into parts whose size
match the size of required address then the left
and right parts are shifted and added in middle
part.
V For Example: consider digit roll no divided
into 3 parts of 3 digits.
2 36 62
Address=2+36 +32=
The resulting sum is greater than so
discard leading element.
V ±hile creating the groups, the left and right
numbers are reversed on fixed boundary
between them and center number keeping as it
is are added. If resulting number exceeds
then the leading element is ignored
V Example: consider 2 36 62
So Address= 2+36 +26=
The resulting sum is greater than so
discard leading element.
V Rotation hashing generally implemented with
other hashing method. It is most useful when
keys are serially assigned. The algorithm is
rotating the last character in front of key.
V Example:

6 6
62 26
Vet us consider x as the key. According to y=ax +
c. ±e multiply x with a and then add it to c. a and
c are randomly selected. The result is further
calculated by modulo division method.
Address= a x + c modulus list size +
=3 + 7 % +
=3+7%+
=37%+
=7+
=
V Two records cannot occupy same position.
Such a situation when two records occupying
the same position is called as hash clash or
hash collision.
V Rehashing takes place after collision.

V Primary clustering- ±hen two keys hash into
different values compete with each other in
successive rehashes is called primary clustering
V Secondary clustering- ±hen different keys that

hash to same value follow same rehash path is
known as secondary clustering.
V Rehashing open address
V Chaining
V ucket hashing
V Quadratic probe
V Pseudorandom collision resolution
V Rehashing involves using a secondary hash
function on the hash key of the item. The
rehashing function is applied until an empty
position is found where item can be inserted.
V Chaining builds the linked list of all items
whose keys hash to same values. ðuring the
search this short linked list is traversed
sequentially for the desired key.
V In bucket hashing, a bucket that accommodate
multiple data occurrences is used.
V In quadratic probe, the increment is the
collision probe number. For st collision add
, for 2nd collision add 22, for 3rd collision
add 33 and so on«
V First we place the keys using modulo division
method and collision takes place so we use
pseudorandom method until we find the
empty location.
Terms are used when we discuss files
) Field-It is the basic element of data. for e.g. student
first name is a field having some data type.
2) Record- A record is collection of related fields that
can be treated as same by some program or
application
e.g. Employee record contain fields as name,
address, job type, etc.
3) File-A file is the collection of similar records. File
have unique names and may be created or
deleted.
) ðatabase-A database is collection of related data.
V File organization is permanent logical structure
of the file.
V You tells your computer how to retrieve

records from file.
V Rapid access
V Ease of update
V ess storage space
V Simple maintainenece
V Reliablity
V Pile
V Sequential
V Indexed sequential
V Indexed
V Hashed
V It is simplest possible organization, the data are
collected in the file and also not required that
the file must have same format.
V Thus each record must be self describing
including field name as well as values.
V This is common structure for large files, the
records are stored in the file according to key.
V The key must be uniquely identify a record,
hence different keys have different records.
V However adding and deleting becomes some
what difficult.
V Example:
V Here index file is used to speed up the search
process and to overcome the above mentioned
difficulty.
V The single level indexing structure is the
simplest one in which record contains a key
pointer.
V This pointer is the position of the data file .
V An index file contains records order by record
key.
V The record key uniquely identifies the record
and determine the sequence in which it is
accessed with respect to other records.
V In hash file organization address or hash
function is used as the key.
V The direct files make use of hashing on key

value.

Hashing and File Structure

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hashing and File Structure

Caricato da

Copyright:

Formati disponibili

V The best search method introduced so far is the

binary search technique which has search time

V A hash search is a search in which the key,

V If h is hash function and KEY is the key then

V If r is record and h is hash function then hr is

V A good hash function spread keys in arrays.

V A good hash function is easy to compute.

et key=23, listsize= then

V Example: let employee number be 23 then

V Rehashing takes place after collision.

V Secondary clustering- ±hen different keys that

V You tells your computer how to retrieve

V The direct files make use of hashing on key

Potrebbero piacerti anche

Hashing and File Structure

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hashing and File Structure

Caricato da

Copyright:

Formati disponibili

V The best search method introduced so far is the

binary search technique which has search time

V A hash search is a search in which the key,

V If h is hash function and KEY is the key then

V If r is record and h is hash function then hr is

V A good hash function spread keys in arrays.

V A good hash function is easy to compute.

et key=23, listsize= then

V Example: let employee number be 23  then

V Rehashing takes place after collision.

V Secondary clustering- ±hen different keys that

V You tells your computer how to retrieve

V The direct files make use of hashing on key

Potrebbero piacerti anche

et key=23, listsize= then

V Example: let employee number be 23 then