Lab 4 Report

Lab 4 Report
Author: Jorge Berumen

CS2302 – Data Structures
Introduction
The purpose of this lab was to investigate the running times of the hash table compared to the running times of a
binary search tree. In this paper, we compared the AVL tree, or the self-balancing tree, because it is able to
guarantee O(log n) seek time. Nonetheless, both data structures will be tested for the time that it takes to add a
large data set of numbers and traverse the entire structure in the look for the most repeated element.
Proposed solution design and implementation

Because we are going to be keeping track of the frequency a particular element is observed, the data structures
from last lab were modified to reflect this change. This change consists of adding an additional field in both the
AVL node and the linked list data structures. The insertion to both of these structures will still run the same as if
there were not any duplicates. In addition, if a duplicate is observed, then the counter will be incremented by a
value of one – vise-versa if deleted. Hence, the name of the new field is called copies.
The hash table used in this lab used the chaining technique, which stores conflicting key elements in a linked
list resting in the same index. This has its advantages and disadvantages. An advantage of this method is that it
reduces the complexity of the hash table and truly allows the programmer to undermine the conflicting elements
with the same key.
The disadvantage, however, is that conflicting keys often beat the purpose of using the hash table in the first
place (that is why the programmer must use a good hash function, to reduce the number of conflicts). For
example, AVL tree use references, which refer to data scattered all throughout the virtual memory, to store its
elements. To accomplish element retrieval, the AVL tree must jump all over the virtual memory, consequently
consuming a lot of time and having slower access time than arrays. There is no way of predicting how many
elements are going to conflict with each other, thus the linked list has the potential of ruining the array’s speedy
access time. Nonetheless, just by the mere fact that hash tables use the speedy access of an array, arises the
probability that there might be an advantage of using a hash table over an AVL tree.
The entire program is structured in multiple classes that encapsulate each aspect of the lab. There are classes for
the AVL node, AVL Tree, linked list, and hash table. The file Main.java organizes this functionality in an
interface that adequately handles exceptions and wrong inputs.
The program proceeds by asking the user to input a size n, which will be the number of randomly generated
numbers using the Random Walk method provided by Olac Fuentes. After generating these numbers, the
program proceeds in inserting them into both of these data structures and searching for the most frequent
element. The times it took for each data structure to complete the same process are recorded and averaged out
for comparison.
Final Note: The submitted source code has few discrepancies. It will compile without errors, and if executed,
the user will notice that the program prompts them with the question “Rerun program (y/n)”. There is no
problem if the user decides to rerun the program, however, the program fails to re-instantiate the avl_tree,
and the latter iterations of the program continue with the older elements intact. The user can must simply:
1) Rerun the entire program, or
2) Move the instantiation of the avl_tree inside the do-while loop
Also, when the results are displayed, the program will state that the “Largest number is …” This statement is
simply referring to the element that had the “largest” count of repetitions, not the “largest” element.
Experimental Results
Because these numbers were based on random values generated by the Random Walk algorithm, there were six
trials executed in order to calculate a more precise estimate of the algorithm. The trials were performed for both
the hash table and the AVL tree on the same procedure. The results are shown below:
Input 1000
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Average in milliseconds
Hash AVL Hash AVL Hash AVL Hash AVL Hash AVL Hash AVL Hash AVL
4 6 6 7 4 5 4 5 6 7 3 5 4.5 5.833333333
Input 10000
10 16 12 16 11 15 10 16 17 22 10 16 11.66666667 16.83333333
Input 100000
61 68 51 51 56 78 54 81 67 100 68 51 59.5 71.5
Input 1000000
176 597 199 616 174 640 186 580 178 580 187 757 183.3333333 628.3333333
Input 1500000
754 938 990 705 222 872 514 685 499 711 556 768 589.1666667 779.8333333
Input 2000000
380 1285 548 906 427 1165 348 962 318 968 596 935 436.1666667 1036.833333
Input 2500000
376 1729 431 1511 405 1632 1236 1544 418 1563 460 1239 554.3333333 1536.333333
Input 3000000
3986 1998 4469 1724 4182 2299 5447 1781 4234 1813 4478 1474 4466 1848.166667
Table 4.1
After averaging out the trials the, the mean times were graphed to illustrate the duration of each task per input
size. The illustration below allows us to see the advantage of using a hash map over an AVL tree.
Hash Table vs AVL Tree
Running time comparison
10000 4,466
1,536
780 1,037
Running Time in Milliseconds
628
1000 1,848
(Graphed in log10)
589 436 554

72
100 183
17
60
10 6
12
5
1
Input Size
Hash Table Avl Tree
Figure 4.1
Notice on how the last input size, 3 million, the hash table underperforms. There could be several reasons for
this irregularity. For one, conflicting keys become clustered and crowded, proving slower access. Another is
that the Java Virtual Machine had trouble allocating a remarkably large array for the hash map.
Conclusion
It is evident that the hash table is a good choice for a data structure that involves highly distinguishable data and
non-conflicting keys. Just as seen in the results, the AVL Tree lags behind the hash table by a considerable
amount of time. At first glance, these differences seem minute and perhaps as insignificant, yet, for example, in
corporations where these large data sets are greater than three million elements and are frequently processed
every day, these milliseconds amount to vast amounts of time that could have otherwise been saved with
another data structure.

Lab 4 Report

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lab 4 Report

Caricato da

Copyright:

Formati disponibili

Lab 4 Report

Author: Jorge Berumen

Proposed solution design and implementation

589 436 554

Potrebbero piacerti anche