Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal
INTRODUCTION
A File structure is a combination of representations for data in files and of operations for
accessing the data. A File structure application allows us to read, write, and modify data. It might
also support finding the data that matches some search criteria or reading through the data in
some particular order .An improvement in file structure design may make an application
hundreds of times faster. The details of representation of data and the implementation of the
operations determine the efficiency of the file structure for particular application.
The fundamental operation of file systems: open, create, close, read, write, and seek.
Each of these operations involves the creation or use of a link between a physical file stored on a
secondary device and a logical file that represents a program’s more abstract view of the same
file. When the program describes an operation using the logical file name, the equivalent
physical operation gets performed on the corresponding physical file.
Disks are very slow compared to memory. On the other hand, disks provide enormous
capacity at much less cost than memory. They also keep the information stored on them when
they are turned off. The tension between a disk’s relatively slow access time and its enormous,
nonvolatile capacity is the driving force behind file structure design. Good file structure design
will give us access to all the capacity without making our applications spend a lot of time waiting
for the disk. A tremendous variety in the types of data and in the needs of applications makes file
structure design very important.
The problems that researchers struggle will reflect the same issues that one confronts in
addressing any substantial file design problem. Working through the approaches to major file
design issues shows one a lot about how to approach new design problems. Goals of research
and development in file structures are:
 Get the information with one access to the disk.
 Structures that allow to find the target information with as few accesses as possible.
 File structures to group information so that to get everything we need with only one trip
to the disk.
Dept of ISE 1 2007-08

SECTION 1
REQUIREMENTS SPECIFICATIONS
Requirements for Part 1:
In part 1, we are required to create a student record file. The record consists of the
following fields:
1. University Serial Number
2. Name
3. Address
4. Semester
5. Branch
There should be methods to initialize and assign a record. Also, we should be able to add
a new record, delete a record and modify a record. The number of files is fixed, but the lengths
of the fields are variable.
Requirements for Part 2:
In the second part, we need to develop a hashed index for the student record file
developed in Part 1. The key for the index is the student USN (University Serial Number). We
need to hash the keys and then store the key- reference pairs for further access. Once we develop
a hashed index, this index is used for the retrieval of records.
We need to provide the following functionalities:
1. Add a record.
2. Delete a record.
3. Modify a record.
Also, we need to demonstrate the doubling of the directory size and the space utilization
of the buckets.

Hardware Requirements:
PROCESSOR : Pentium Processor

PRIMARY MEMORY : 64 MB and above.
SECONDARY MEMORY : 1 GB and above.
Software Requirements:
PLATFORM: Microsoft Windows

COMPILER: Turbo C++
LANGUAGE USED: Oops with C++
External Interfaces
User interface GUI (Graphical User Interface) is provided.

SECTION 2
INTRODUCTION TO FILE STRUCTURES
Different Types Access Methods

Different types of access methods in file structure are:
 Indexing
 Co sequential processing model

 AVL trees
 B-trees
 B+ trees
 Hashing
Indexing:
Indexing is a way of structuring a file so that records can be found by key. This is an
Alternative to sorting. Unlike sorting, indexing permits us to perform binary searches for keys in
variable-length record files. If the index can be held in memory, record addition, deletion, and
retrieval can be done much more quickly with an indexed, entry-sequenced file than with a
sorted file. Indexes can do much more than merely improve on access time: they can provide us
with new capabilities that are inconceivable with access methods based on sorted data records.
The most exciting new capability involves the use of multiple secondary indexes.
Co sequential processing model:

The Co sequential processing model can be applied to problems that involve operations
such as matching and merging (and combination of these) on two or more sorted input files. In
its most complete form, the model depends on certain assumptions about the data in the input
files. Given these assumptions, we can describe the processing components of the model and
define pure virtual functions that represent those components.
Co sequential Operations involve the coordinated processing of two or more sequential
lists to produce a single output list. Sometimes the processing results in a merging, or union, of
the items in the input lists; sometimes the goal is a matching or intersection, of the items in the
lists; and other times the operation is a combination of matching and merging. These kinds of
operations on sequential lists are the basis of a great deal of file processing.
AVL trees:
It is a self-adjusting binary tree structure. AVL is a height-balanced tree, the allowed
difference between the heights of any two sub trees is one.
The important feature of an AVL tree is:
By setting a maximum allowable difference in the height of any two sub trees,
AVL trees guarantee a minimum level of performance in searching.
B-trees:
B-trees are multilevel indexes that solve the problem of linear cost of insertion and
deletion. This is what makes B-trees so good, and why they are now the standard way to
represent indexes. The solution is twofold. First, don’t require that the index records be full.
Second, don’t shift the overflow record into two records, each half full. Deletion takes similar
strategy of merging two records into a single record when necessary.
B+ trees:
The disadvantage of B-tree is that file could not be accessed sequentially with efficiency.
Adding a linked list structure at the bottom level of B-tree solved this problem. The combination
of B-tree and sequential linked list gave rise to B+ trees.
Hashing:
It is a good way of retrieval of records in one access for files that do not change greatly
with time but it does not work will with volatile, dynamic files. A hash function is like a black
box that produces an address every time a key is dropped. Hashing is like indexing in that it
involves associating a key with a relative record address.

SECTION 3
WHY C++?
Object-oriented toolkit:
Making file structures usable in application development requires tuning this conceptual
toolkit into application programming interfaces-collection of data types and operations that can
be used in application. We have chosen to employ object oriented approach in which data types
and operators are presented in a unified fashion as class definitions.
C++ is used in design of a file structure. C++ is an object oriented programming

language. Objected oriented programming supports the integration of data contents and behavior
into a single design. C++ class definition contains both data and function members and allow
programmers to control precisely the manipulation of objects. These classes are also an extensive
presentation of the features of C++. These features include:
 Class Definition
 Constructors
 Public and private sections
 Operator overloading
And the above features enhance the programmer’s ability to control the behaviors of objects.

SECTION 4
PROJECT PART I
Problem Definition:
Design a class called student. Each object of this class represents information about a
single student. Members should be included for student USN (University Serial Number), Name,
Address, Semester, Branch, etc. Methods should be included for initialization, assignment &
modification values. Provide methods to read the member values to the output stream suitably
formatted. Add methods to store objects as records into the files and load the objects from the file
using buffering, design a suitable IOBuffer class hierarchy. Add pack and Unpack methods to
class student. For all the mini projects assume a fixed-filed, variable-length record with delimiter
record structure for the data file.
Specification And Design:
The part 1 of the project deals with creating a student record file. The record consists of
the following fields as data members.
1. University Serial Number.---->USN
2. Name ---->name
3. Address ---->addr
4. Branch ---->brch
5. Semester. ---->sem
We have provided the following member functions for the operations on the file.
1. Creating a record ---->insert()
2. Assigning a record. ---->assign()
3. Searching a record ---->search()
4. Deleting a record. ---->delet()
5. Modifying a record. ---->modify()
6. Displaying a record ---->display()

insert() function is used to insert the record of one student at a time.
Start USN Accept USN Yes Stop

USN Duplicat
e?
No
Accept
Data
Store the data in

data.dat
assign() function is used to assign the default value to the data members. Here we
assigned NULL value for all data members as a default value.
search () function is used to search for a record based on key value ( USN ).
Start USN Accept EOF Read a record from

USN ? No Data.dat and unpack the
USN
Yes
Compare the USN with the

key entered
Mismatch
Match
Stop Display the record

delet() function is used to delete a student’s record based on the key value.
Start USN Accept EOF

USN From Data.dat and unpack
No
? the USN
Yes

key entered
Mismatch
Match
Stop Place * at the beginning of

record indicates deleted record
modify() function is used to modify the record based on key field entered.
Start USN Accept From Data.dat and unpack

EOF
USN No the USN
?
Yes

Stop key entered
Mismatch
Match
Store the newly accepted Accept new values from

data in disk user
display() function is used to display the records in the file.
Start USN Accept EOF Yes Stop

USN
?
No and USN is
present
Read and
display the
record

Algorithm for Part 1:

The steps of insertion are as follows:
• Accept the USN from the user
• Check for duplication, for the duplicate display error, else continues.
• Accept the data from user and check for constraints.
• By making use of pack() function, pack the data and put it on the buffer.
• By making use of write() function, write the packed data from buffer to disk.
The steps of searching are as follows:

• Accept the USN from user.
• By making use of read() function, read the records from the disk to buffer.
• By making use of unpack() function, unpack only the key, compare it with the key user
has entered. If it matches unpack whole record and display it.
• If the match does not occur, go to next record until end of file.
The steps of deletion are as follows:

• Accept the key value from the user.
• Read the record to the buffer using read().
• Unpack the USN from buffer to RAM and compare the USN with key entered
• If it matches, use tombstones to indicate record has been deleted.
• If it does nit matches, go to next record till end of file.
The steps of modification are as follows:

• Accept the key value from the user.
• Read the record to the buffer using read().
• Unpack the key field from buffer to RAM and compare the with key entered
• If it matches,
 Accept the new value from the user.
 Write the packed data from the buffer to the disk.

• If the key doesn’t matches check for next record, repeat until eof, then display error
message.
Setps for deleting a record is as follows :

• Read the first set of records from the disk to the buffer.
• Unpack the records in buffer and put it on to the RAM.
• Display the record and repeat until the end of file.
We have provided the following buffer operations.

• read()-from file to buffer
• write()-from buffer to file
• pack()-from RAM to buffer
• unpack()-from buffer to RAM
Pack() write()
STORAGE
RAM BUFFER DEVICE
Unpack() read()
Figure: Pack(), unpack(), read() and write() operation
Analysis And Design of Buffer Hierarchy:
The read and write file operations need a buffer, which is developed using a hierarchy of
classes. The highest class in the hierarchy is the class IOBuffer. Since we know the number of
fields and since the lengths of the fields are variable, we use the Delimited Text Buffer class.
Here, we write the length of the record first and then the record itself. The fields are separated
using a delimiter. There are methods that pack the fields into the buffer and there are methods
that unpack the fields from the buffer. The access to the records of the file is sequential. We also

provide for addition of records and deletion of records. The fields of records can be assigned a
specific value and records can also be modified. In general we have the following hierarchy:
• IO BUFFER
• VARIABLE LENGTH BUFFER and FIXED LENGTH BUFFER
• DELIMITED FIELD BUFFER, LENGTH FIELD BUFFER and
FIXED FIELD BUFFER
The hierarchy is shown in the diagram.
IOBUFFER
Char array of
Buffer
VARIABLE LENGTH FIXED LENGTH

BUFFER BUFFER
Read and Write operations Read and Write operations
DELIMITED FIELD LENGTH FIELD FIXED FIELD

BUFFER BUFFER BUFFER
Pack and Unpack operations Pack and Unpack Pack and Unpack
Figure: Buffer Class Hierarchy
The field packing and unpacking operations, in their various forms, can be Encapsulated
into C++ classes. The three different field representation strategies are Delimited, length-based
and fixed length is implemented in different classes. Class IO BUFFER does not include any
implementation methods. It is an abstract Class and hence object of it can be declared. All the
necessary read, write pack and unpack operation is provided in classes down the hierarchy.

Inheritance allows related classes to share members. We use this powerful Mechanism
provided by C ++ to buffering. Object-Oriented design of classes Guarantees that operations on
objects are performed correctly.
SECTION 5
PROJECT PART II
Problem Definition:
Develop a hashed index of the student record file with the USN as the key. Write a driver
program to create a hashed file from an existing student record file. Demonstrate the recursive
collapse of directory over more than one level.
1. Demonstrate doubling of the directory size

2. Display the space utilization for buckets and directory size.
Specification And Design:
The second part of the project deals with providing O (1) access to the records of the file.
For this, we need to develop an index to the file. The USN is used as the key. To provide O (1)
access we need to hash the index. There are two approaches to hashing.
1. Static hashing
2. Dynamic hashing.
Static hashing is very good for the files, which do not undergo any changes frequently.
But real time files change frequently and the performance of static hashing deteriorates.
Dynamic hashing copes with this problem. In this approach, we hash the key and use
only a part of the hashed address. This approach is called “ Use more as we need more”
approach.

We also use what are called “BUCKETS”. Buckets are nothing but containers of key
reference pairs. All the keys in a bucket have same starting address. Once the bucket is full, we
split the bucket into two and distribute the keys among the buckets. To keep track of the buckets,
we develop another structure, a DIRECTORY. A directory maintains an array of the bucket
locations.
Thus, we hash a key and get a part of the hashed address depending on the population of
the records. Then we use this part of the hashed address as an index into the array of buckets and
find its location. We then directly seek to that location and get the record.
The main design issue here is whether we provide a static hashing that uses a prespecified
size of address space or a dynamic hashing. The dynamic hashing is very useful for files that
change frequently.
We have decided to implement extendible hashing, which uses a part of the hashed
address depending on the size of the file. This is called the use-more-as-U-need-more approach.
We do not hash the data file itself. Instead, we only hash the index. The index consists of key-
record address pairs.
Buckets are used to resolve collision problem. Here one address can hold more than one
record or index entry. We also use Directories to keep track of the buckets. The bucket consists
of key-reference pairs. This means that the buffer class that needs to be used is fixed length
buffer. We keep the addresses of the buckets in memory using arrays.
Buckets are filled with key-reference pairs as and when the data records are inserted.
When a bucket gets filled, the bucket is split into two and the records are redistributed. This
means that we are using more of the hashed address as and when the file size increases. Also, we
keep track of deletions. A deletion may trigger the collapse of the directory, as less number of
buckets will be needed. Thus the hashing technique becomes truly dynamic.
Structure of the Project

The project is basically required to do any operation based on hashing the primary key
USN. Hence it all begins by hashing the key into a valid address. The address points to the
directory entry. The directory consists of address to buckets. The bucket in tUSN contains the
address to the address to the record in the STUDENT.DAT file.
The below diagram shows what our project does. The general steps are:
• A given key is hashed to a directory address.
• The directory cell contains the address for the bucket.
• Bucket contains the address of the record in student file.
Figure: Structure of hashed index.
KEY D
S
I BUCKETS T
U
R D
E
E
N
HASH
C BUCKETS T
T F
I
O BUCKETS L
E
R
Creating the addresses:
MakeAddress function extracts a portion of the full hashed address. This function is also
used to reverse the order of the bits in the hashed address, making the lowest–order bit of the
hash address the highest-order bit of the value used in extendible hashing because least
significant integer values tends to have more variation than the high-order bits.
Hash function: retUSNs an integer hash value for key for a 15-bit.

Splitting in Buckets:
Method SPLIT of class Bucket divides keys between an existing bucket and a new
bucket. If necessary, it doubles the size of the directory to accommodate the new bucket.
Directory and Bucket Operations:
The INSERT method first searches for the key. SEARCH arranges for the CurrentBucket
member to contain the proper bucket for the key. The FIND method determines where the key
would be if it were in the structure.
Method DoubleSize() and InsertBucket():
The Insert method manages record addition. If the key is already exists, Insert retUSNs
immediately. If the key does not exist, Insert calls Bucket::Insert, for the bucket into which the
key is to be added. If the bucket is full, Bucket::Insert calls Split to handle the task of splitting
the bucket. If the directory needs to be larger, Split calls method Directory::DoubleSize to double
the directory size.
Finding Buddy Buckets:
The method works by checking to see whether it is possible for there to be a buddy
bucket. The next test compares the number of bits used by the bucket with the number of bits
used in the directory address space. A pair of buddy buckets is a set of buckets that are
immediately descendents of the same node in the tries. This method retUSNs a buddy bucket or
-1 if none found.
Collapsing the Directory:
Method Directory::Collapse begins by making sure that we are not at the lower limit of
directory size. By treating the special case of a directory with a single cell here, at the start of the
function, we simplify subsequent processing: with the exception of this case, all directory sizes
are evenly divisible by 2. The test to see if the directory can be collapsed consists of examining

each pair of directory cells to see if they point to different buckets. As soon as we find such a
pair, we know we cannot collapse the directory and method retUSNs
Deletion operations:
We first find the key to be deleted. IF we cannot find it, return failure; if it found call
Bucket::Remove to remove the key from the bucket. Return the value reported back from the
method.
Space utilization:
It is defined as the ratio of actual number of records to the total number of records that
could be stored in allocated space. Expectation of average utilization of 69 %. Space utilization
can be calculated using the formula:
Utilization= (r / b*N)
Where, r is number of records
b is block size, and
N is average number of blocks

Source Code
int MakeAddress (char *key, int depth)

{
int retval = 0;
int mask = 1;
int hashVal = Hash(key);
for ( int j = 0; j < depth; j++)
{
retval = retval << 1;
int lowbit = hashVal & mask;
retval = retval | lowbit;
hashVal = hashVal >> 1;
}
retUSN retval;
}
int Hash (char * key)
{
int sum = 0;
int len = strlen(key);
if (len % 2 == 1) len++; // make len even
for(int j=0; j < len; j+=2)
sum = ( sum +100 * key[j] + key[j+1]) %19937; retUSN sum;
}
Class Bucket: public Text Index
{
protected:
Bucket (Directory & dir, int maxKeys = defaultMaxKeys);
int Insert (char * key, int recAddr);
int Remove (char * key);
Bucket * Split ();

int NewRange (int & newStart, int & newEnd);
int Redistribute (Bucket & newBucket);
int FindBuddy ();// find the bucket that is the buddy of this
int TryCombine (); // attempt to combine buckets
int Combine (Bucket * buddy, int buddyIndex);
int Depth;
int BucketAddr;
ostream & Print (ostream &);
friend class Directory;
friend class BucketBuffer;
};
class BucketBuffer: public TextIndexBuffer
{
public:
BucketBuffer (int keySize, int maxKeys);
int Pack (const Bucket & bucket);
int Unpack (Bucket & bucket);
};
class Directory
{
public:
Directory (int maxBucketKeys = -1);
Directory ();
int Open (char * name);
int Create (char * name);
int Close ();
int Insert (char * key, int recAddr);
int Remove (char * key);
int Search (char * key); // retUSN RecAddr for key
int ReSize (void);
int Reduction (void);

void spaceutil(char * myfile);

ostream & Print (ostream & stream);
protected:
int Depth;
int NumCells;
int * BucketAddr;
int DoubleSize ();
int Collapse ();
int InsertBucket (int bucketAddr, int first, int last);
int RemoveBucket (int bucketIndex, int depth);
int Find (char * key);
int StoreBucket (Bucket * bucket);
int LoadBucket (Bucket * bucket, int bucketAddr);
int MaxBucketKeys;
BufferFile * DirectoryFile;
LengthFieldBuffer * DirectoryBuffer;
Bucket * CurrentBucket;
BucketBuffer * theBucketBuffer;// buffer for buckets
BufferFile * BucketFile;
int Pack () const;
int Unpack ();
Bucket * PrintBucket; friend class Bucket;
};
int Directory::Insert (char * key, int recAddr)
{
int found = Search (key);
if (found == -1) retUSN CurrentBucket->Insert(key, recAddr);
retUSN 0;// key already in directory
}
int Directory::Search (char * key)
{
int bucketAddr = Find(key);

LoadBucket (CurrentBucket, bucketAddr);

retUSN CurrentBucket->Search(key);
}
Bucket * Bucket::Split ()
{
int newStart, newEnd;
if (Depth == Dir.Depth || Dir.NumCells==1)
{
doublesizetrue=1;
Dir.DoubleSize();
}
Bucket * newBucket = new Bucket (Dir, MaxKeys);
Dir.StoreBucket (newBucket);
NewRange (newStart, newEnd);
Dir.InsertBucket(newBucket->BucketAddr, newStart, newEnd);
Depth ++;
newBucket->Depth = Depth;
Redistribute (*newBucket);
Dir.StoreBucket (this);
Dir.StoreBucket (newBucket);
retUSN newBucket;
}
int Directory:: DoubleSize ()
{
int newSize = 2 * NumCells;
Int *newBucketAddr = new int[newSize];
for(int i=0;i<NumCells;i++)
{
newBucketAddr[2*i] = BucketAddr[i];
newBucketAddr[2*i+1] = BucketAddr[i];
}
delete BucketAddr;

BucketAddr = newBucketAddr;
Depth++;
NumCells = newSize;
retUSN 1;
}
int Bucket::FindBuddy ()
{
if (Dir.Depth == 0) retUSN -1;
if (Depth < Dir.Depth) retUSN -1;
int sharedAddress = MakeAddress(Keys[0], Depth);
retUSN sharedAddress ^ 1;
}
int Directory :: Collapse()
{
if (Depth == 0) retUSN 0;
for (int i=0;i<NumCells;i+=2)
if(BucketAddr[i] != BucketAddr[i+1])
retUSN 0;
int newSize = NumCells / 2;
int * newAddrs = new int [newSize];
for(int j =0; j<newSize;j++)
newAddrs[j] = BucketAddr[j*2];
delete BucketAddr;
BucketAddr = newAddrs;
Depth --;
collapsetrue=1;
NumCells = newSize;
retUSN 1;
}
int Bucket::TryCombine ()
{
int result;

int buddyIndex = FindBuddy ();

if (buddyIndex == -1) retUSN 0;
int buddyAddr = Dir.BucketAddr[buddyIndex];
Bucket * buddyBucket = new Bucket (Dir, MaxKeys);
Dir . LoadBucket (buddyBucket, buddyAddr);
if (NumKeys + buddyBucket->NumKeys > MaxKeys) retUSN 0;
Combine (buddyBucket, buddyIndex);
result = Dir.Collapse ();
if (result) TryCombine();
retUSN 1;
}
int Bucket::Remove (char * key)
{
int result = TextIndex::Remove (key);
if (!result) return 0;
TryCombine ();
Dir.StoreBucket(this);
return 1;
}
int Directory::Remove (char * key)
{
int bucketAddr = Find(key);
LoadBucket (CurrentBucket, bucketAddr);
return CurrentBucket -> Remove (key);
}
void Directory::spaceutil(char * myfile)
{
fstream file(myfile,ios::in);
float numrecs=0,util;
char ch;
while(1)
{

file>>ch;
if(file.fail())
break;
else if(ch=='#')
numrecs++;
}
file.close();
int cnt=1;
for(int i=0;i<NumCells-1;i++)//counts number of buckets
{
if(BucketAddr[i+1]==BucketAddr[i])
continue;
cnt++;
}
util=(numrecs/(cnt*4))*100;//utilization=r/(bN)
cout<<"\nRECORDS IN THE FILE = "<<numrecs<<"\n";
cout<<"\n\nBUCKETS USED BY THE RECORDS = "<<cnt++;
cout<<"\n\n\nDIRECTORY SIZE IS = "<<NumCells;
cout<<"\n\n\nUTILIZATION OF SPACE = "<<util<<"%\n\n";
//for directory
float x;
x=pow(numrecs,1.25);
x=x*0.98;
cout<<"\nUTILIZATION 0F SPACE BY THE DIRECTORY = "<<x<<"bytes";
}
void Insert(char *myfile)
{
Student s;
char str[30];
setcolor(BLACK);
settextstyle(2,0,5);

outtextxy(230,100,"ENTER USN NUMBER :");

strget(420,100,s.Usn,10);
strupr(s.Usn);
int res = Dir.Search(s.Usn);
if(res!=-1)
{
outtextxy(400,400,"This reg-no already exists!!!");
outtextxy(400,410,"Press Any Key....");
getch();
return;
}
if((strcmp(s.Usn,NULL)==0))
{
outtextxy(400,400,"Enter a Valid Key!!!\a");
getch();
return;
}
if(!isdigit(s.Usn[0])||!isalpha(s.Usn[1])||!isalpha(s.Usn[2])||!isdigit(s.Usn[3])||!
isdigit(s.Usn[4])||!isalpha(s.Usn[5])||!isalpha(s.Usn[6])||!isdigit(s.Usn[7])||!isdigit(s.Usn[8])||!
isdigit(s.Usn[9]))
{
outtextxy(400,400,"Enter a Valid Key!!!\a");
getch();
return;
}
outtextxy(230,120,"ENTER NAME :");
strget(420,120,s.Name,20);
strupr(s.Name);
int re = Dir.Search(s.Name);
if(re!=-1)
{

outtextxy(400,220,"Name Duplication..!!!");
getch();
}
if((strcmp(s.Name,NULL)==0))
{
outtextxy(400,400,"Enter a Valid NAME!!!\a");
getch();
}
if(!isalpha(s.Name))
{
outtextxy(400,220,"Name Contains other than alpha charector!!");
outtextxy(400,240,"Re-enter NAME");//
getch();
goto NAME;
}
outtextxy(230,140,"ENTER ADDRESS :");
strget(420,140,s.Address,30);
strupr(s.Address);
outtextxy(230,160,"ENTER SEMESTER :");
strget(420,160,s.Semester,2);strupr(s.Semester);
if(atoi(s.Semester)>8)
{
outtextxy(400,400,"Invalid Semester!!!\a");
getch();
return;
}
outtextxy(230,180,"ENTER BRANCH :");
strget(420,180,s.Branch,5);strupr(s.Branch);
int flag=0;
for(int i=0;i<16;i++)
if(strcmp(s.Branch,s.Brlist[i])==0)
{

flag=1;
break;
}
if(flag==0)
{
outtextxy(400,400,"InValid Branch!!!\a");
getch();
return;
}
outtextxy(230,200,"ENTER COLLEGE :");
strget(420,200,s.College,10);
strupr(s.College);
int recaddr=s.Append(myfile);
Dir.Insert(s.Usn,recaddr);
outtextxy(400,400,"Record Successfully Appended.");
getch();
if(doublesizetrue)
{
closegraph();
clrscr();
cprintf("The Directory Has Doubled");
doublesizetrue=0;
Dir.Print(cout);
}
}
void deleterecord (char *myfile)

{
Student s;
strupr(s.Usn);
outtextxy(50,50,"ENTER USN NUMBER : ");

strupr(s.Usn);
int addr=Dir.Search(s.Usn);
if(addr==-1)
{
outtextxy(300,300,"THE RECORD DOES NOT EXIST");
getch();
return;
}
fstream ofile(myfile,ios::in|ios::out);
ofile.seekp(addr,ios::beg);
ofile.write("*",1);
ofile.close();
Dir.Remove(s.Usn);
outtextxy(200,400,"THE RECORD IS DELETED SUCCESSFULLY");
compaction();
getch();
}
void display(char *myfile)
{
Student s;
setcolor(BLACK);
outtextxy(50,50,"ENTER USN NUMBER : ");
strupr(s.Usn);
int addr;
if((addr = Dir.Search(s.Usn))==-1)
{
outtextxy(300,300,"Record not found!");
outtextxy(300,320,"Press Any Key..");
getch();

return;
}
DelimFieldBuffer :: SetDefaultDelim('|');
DelimFieldBuffer Buff;
fstream file(myfile,ios::in);
Buff.DRead(file,addr);
s.Unpack(Buff);
char str[100];
sprintf(str,"USN NO : %s",s.Usn);
outtextxy(100,100,str);
sprintf(str,"NAME : %s",s.Name);
sprintf(str,"ADDRESS : %s",s.Address);
sprintf(str,"SEMESTER : %s",s.Semester);
sprintf(str,"BRANCH : %s",s.Branch);
sprintf(str,"COLLEGE : %s",s.College);
file.close();
}
SECTION 6
GUI DESIGN
SECTION 7

SNAPSHOTS
MAIN MENU

RECORD INSERTION
DISPLAYING ALL RECORD

RECODR MODIFICATION
DISPLAYING A RECORD

SPACE UTILIZATION
DIRECTORY DISPLAY

SECTION 8
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion:
Hashing is a way of structuring a file so that records can be found by applying a hash
function that transforms a key into address. This address is then used as the basis for insertion
and retrieval of records. Here more than one record can be hashed to the same address, this
phenomenon is called collision. The extendible hashing provides O(1) performance since there is
no overflow. These access time values are truly independent of the size of the file.
Future Enhancement:
Instead of the given STUDENT class, the project can be made to handle a generic class
that accepts a class name as a parameter and used for different applications. Another class called
BUFFERFILE can be included given that it contains a handle to the base class of the buffer class

hierarchy i.e., IOBUFFER and handle to the file for simultaneous manipulation of buffer and file
to support more pure form of OBJECT OREIENTATION.
Some of the possible improvements and new features that can be included are:
 Improved User Interface with commercial level enhancements.
 Support for remote administration of the system.
 Support for simultaneous access and modification of the student file from different
systems.
 Improved free space management for data files.
 Implementation of other addressing techniques in addition to the present hashing
technique to analyze performance issues.
BIBLIOGRAPHY
TITLE AUTHOR
1) FILE STRUCTURES MICHAEL J. FOLK
AN OBJECT ORIENTED BILL ZOELLICK

APPROACH WITH C ++
GREG RICCARDI
2) LET US C++ YASHVANTH KANETKAR
3) THE COMPLETE REFERENCE C++ HERBERT SCHILD


Extendible Hashing Report

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Extendible Hashing Report

Caricato da

Copyright:

Formati disponibili

Extendible Hashing Vinayak Hegde Nandikal

Dept of ISE 1 2007-08

Requirements for Part 1:

Requirements for Part 2:

Dept of ISE 2 2007-08

PROCESSOR : Pentium Processor

PLATFORM: Microsoft Windows

User interface GUI (Graphical User Interface) is provided.

Dept of ISE 3 2007-08

Different Types Access Methods

 Co sequential processing model

Co sequential processing model:

AVL trees guarantee a minimum level of performance in searching.

Dept of ISE 5 2007-08

C++ is used in design of a file structure. C++ is an object oriented programming

Dept of ISE 6 2007-08

Specification And Design:

Dept of ISE 7 2007-08

insert() function is used to insert the record of one student at a time.

Start USN Accept USN Yes Stop

Store the data in

Start USN Accept EOF Read a record from

Compare the USN with the

Stop Display the record

Dept of ISE 8 2007-08

Start USN Accept EOF

Compare the USN with the

Stop Place * at the beginning of

Start USN Accept From Data.dat and unpack

Compare the USN with the

Store the newly accepted Accept new values from

display() function is used to display the records in the file.

Start USN Accept EOF Yes Stop

Dept of ISE 9 2007-08

Algorithm for Part 1:

The steps of searching are as follows:

The steps of deletion are as follows:

The steps of modification are as follows:

Dept of ISE 10 2007-08

Setps for deleting a record is as follows :

We have provided the following buffer operations.

Figure: Pack(), unpack(), read() and write() operation

Analysis And Design of Buffer Hierarchy:

Dept of ISE 11 2007-08

The hierarchy is shown in the diagram.

VARIABLE LENGTH FIXED LENGTH

DELIMITED FIELD LENGTH FIELD FIXED FIELD

Figure: Buffer Class Hierarchy

Dept of ISE 12 2007-08

1. Demonstrate doubling of the directory size

Specification And Design:

Dept of ISE 13 2007-08

Structure of the Project

• The directory cell contains the address for the bucket.

• Bucket contains the address of the record in student file.

Figure: Structure of hashed index.

Creating the addresses:

Dept of ISE 15 2007-08

Directory and Bucket Operations:

Method DoubleSize() and InsertBucket():

Finding Buddy Buckets:

Collapsing the Directory:

Dept of ISE 16 2007-08

Dept of ISE 17 2007-08