Sei sulla pagina 1di 19

Binary Search Tree

In computer science, a binary search tree (BST) is a node based binary tree data structure which has the following properties:

The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key. Both the left and right subtrees must also be binary search trees. Each node (item in the tree) has a distinct key.

From the above properties it naturally follows that: Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their their associated records. The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as inorder traversal can be very efficient. Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13

Binary Search tree is a binary tree in which each internal node x stores an element such that the
element stored in the left subtree of x are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x. This is called binary-search-tree property.

The basic operations on a binary search tree take time proportional to the height of the tree. For a complete binary tree with node n, such operations runs in (lg n) worst-case time. If the tree is a linear chain of n nodes, however, the same operations takes

(n) worst-case time.

Binary search tree


First of all, binary search tree (BST) is a dynamic data structure, which means, that its size is only limited by amount of free memory in the operating system and number of elements may vary during the program run. Main advantage of binary search trees is rapid search, while addition is quite cheap. Let us see more formal definition of BST. Binary search tree is a data structure, which meets the following requirements:

it is a binary tree; each node contains a value; a total order is defined on these values (every two values can be compared with each other); left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.

Notice, that definition above doesn't allow duplicates.

Example of a binary search tree

What for binary search trees are used?


Binary search tree is used to construct map data structure. In practice, data can be often associated with some unique key. For instance, in the phone book such a key is a telephone number. Storing such a data in binary search tree allows to look up for the record by key faster, than if it was stored in unordered list. Also, BST can be

utilized to construct set data structure, which allows to store an unordered collecti on of unique values and make operations with such collections. Performance of a binary search tree depends of its height. In order to keep tree balanced and minimize its height, the idea of binary search trees was advanced in balanced search trees (AVL trees, Red-Black trees, Splay trees). Here we will discuss the basic ideas, laying in the foundation of binary search trees.

Binary search tree. Internal representation


Like any other dynamic data structure, BST requires storing of some additional auxiliary data, in order to keep its structure. Each node of binary tree contains the following information:

a value (user's data); a link to the left child (auxiliary data); a link to the right child (auxiliary data).

Depending on the size of user data, memory overhead may vary, but in general it is quite reasonable. In some implementations, node may store a link to the parent, but it depends on algorithm, programmer want to apply to BST. For basic operations, like addition, removal and search a link to the parent is not necessary. It is needed in order to implement iterators. With a view to internal representation, the sample from the overview changes:

Leaf nodes have links to the children, but they don't have children. In a programming language it means, that corresponding links are set to NULL.

Binary search tree. Adding a value


Adding a value to BST can be divided into two stages:

search for a place to put a new element;

insert the new element to this place.

Let us see these stages in more detail. Search for a place

At this stage analgorithm should follow binary search tree property. If a new value is less, than the current node's value, go to the left subtree, else go to the right subtree. Following this simple rule, the algorithm reaches a node, which has no left or right subtree. By the moment a place for insertion is found, we can say for sure, that a new value has no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let us see it at the picture. Gray circles indicate possible places for a new node.

Now, let's go down to algorithm itself. Here and in almost every operation on BST recursion is utilized. Starting from the root,
1. check, whether value in current node and a new value are equal. If so, duplicate is found. Otherwise, 2. if a new value is less, than the node's value: o if a current node has no left child, place for insertion has been found; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if a current node has no right child, place for insertion has been found; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating a case of insertion in the binary search tree. Example

Insert 4 to the tree, shown above.

Code snippets
The only the difference, between the algorithm above and the real routine is that first we should check, if a root exists. If not, just create it and don't run a common algorithm for this special case. This can be done in the BinarySearchTree class. Principal algorithm is implemented in the BSTNode class.

Binary search tree. Lookup operation


Searching for a value in a BST is very similar to add operation. Search algorithm traverses the tree "in-depth", choosing appropriate way to go, following binary search tree property and compares value of each visited node with the one, we are looking for. Algorithm stops in two cases:

a node with necessary value is found; algorithm has no way to go.

Search algorithm in detail

Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operation on BST, search algorithm utilizes recursion. Starting from the root,
1. check, whether value in current node and searched value are equal. If so, value is found. Otherwise, 2. if searched value is less, than the node's value: o if current node has no left child, searched value doesn't exist in the BST; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if current node has no right child, searched value doesn't exist in the BST; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree. Example

Search for 3 in the tree, shown above.

Binary search tree. Removing a node


Remove operation on binary search tree is more complicated, than add and search. Basically, in can be divided into two stages:

search for a node to remove; if the node is found, run remove algorithm.

Remove algorithm in detail

Now, let's see more detailed description of a remove algorithm. First stage is identical to algorithm for lookup, except we should track the parent of the current node. Second part is more tricky. There are three cases, which are described below.
1. Node to be removed has no children.

This case is quite simple. Algorithm sets corresponding link of the parent to NULL and disposes the node. Example. Remove -4 from a BST.

2. Node to be removed has one child. It this case, node is cut from the tree and algorithm links single child (with it's subtree) directly to the parent of the removed node. Example. Remove 18 from a BST.

3. Node to be removed has two children. This is the most complex case. To solve it, let us see one useful BST property first. We are going to use the idea, that the same set of values may be represented as different binary-search trees. For example those BSTs:

contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can do following:
o o o

choose minimum element from the right subtree (19 in the example); replace 5 by 19; hang 5 as a left child.

The same approach can be utilized to remove a node, which has two children:
o o o

find a minimum value in the right subtree; replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate! apply remove to the right subtree to remove a duplicate.

Notice, that the node with minimum value has no left child and, therefore, it's removal may result in first or second cases only. Example. Remove 12 from a BST.

Find minimum element in the right subtree of the node to be removed. In current example it is 19.

Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two nodes with the same value.

Remove 19 from the left subtree.

AVL tress
An AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. The AVL tree is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information." The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees. AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications.The AVL tree balancing algorithm appears in many computer science curricula.

B-Trees: Balanced Tree Data Structures


Tree structures support various basic dynamic set operations including Search, Predecessor, Successor, Minimum, Maximum, Insert, and Delete in time proportional to the height of the tree. Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in the tree. To ensure that the height of the tree is as small as possible and therefore provide the best running time, a balanced tree structure like a red-black tree, AVL tree, or btree must be used. When working with large sets of data, it is often not possible or desirable to maintain the entire structure in primary storage (RAM). Instead, a relatively small portion of the data structure is maintained in primary storage, and additional data is read from secondary storage as needed. Unfortunately, a magnetic disk, the most common form of secondary storage, is significantly slower than random access memory (RAM). In fact, the system often spends more time retrieving data than actually processing data. B-trees are balanced trees that are optimized for situations when part or all of the tree must be maintained in secondary storage such as a magnetic disk. Since disk accesses are expensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses. For example, a b-tree with a height of 2 and a branching factor of 1001 can store over one billion keys but requires at most two disk accesses to search for any node (Cormen 384).

The Structure of B-Trees

Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored in non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keys less than or equal to the key but greater than the preceeding key. A node also has an additional rightmost child that is the root for a subtree containing all keys greater than any keys in the node. A b-tree has a minumum number of allowable children for each node known as the minimization factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently, 2t children. Since each node tends to have a large branching factor (a large number of children), it is typically neccessary to traverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a btree will minimize the number of disk accesses required. The minimzation factor is usually chosen so that the total size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifies and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside in primary storage and accesses to secondary storage are comparatively expensive (or time consuming).
Height of B-Trees

For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum degree t greater than or equal to 2,

For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest pages 383-384.

The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to many other balanced tree structures, the base of the logarithm tends to be large; therefore, the number of nodes visited during a search tends to be smaller than required by other tree structures. Although this does not affect the asymptotic worst case height, b-trees tend to have smaller heights than other trees with the same asymptotic height.

Operations on B-Trees
The algorithms for the search, create, and insert operations are shown below. Note that these algorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible. Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all references to a given node be be preceeded by a read operation denoted by Disk-Read. Similarly, once a node is modified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by

Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a corresponding Disk-Read operation. New nodes are created and assigned storage with the Allocate-Node call. The implementation details of the Disk-Read, Disk-Write, and Allocate-Node functions are operating system and implementation dependent.

Examples
Sample B-Tree

Searching a B-Tree for Key 21

Inserting Key 33 into a B-Tree (w/ Split)

A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:

The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right. The heap property: each node is greater than or equal to each of its children according to some comparison predicate which is fixed for the entire data structure.

"Greater than or equal to" means according to whatever comparison function is chosen to sort the heap, not necessarily "greater than or equal to" in the mathematical sense (since the quantities are not always numerical). Heaps where the comparison function is mathematical "greater than or equal to" are called max-heaps; those where the comparison function is mathematical "less than" are called "min-heaps". Conventionally, min-heaps are used, since they are readily applicable for use in priority queues. Note that the ordering of siblings in a heap is not specified by the heap property, so the two children of a parent can be freely interchanged, as long as this does not violate the shape and heap properties (compare with treap). The binary heap is a special case of the d-ary heap in which d = 2. It is possible to modify the heap structure to allow extraction of both the smallest and largest element in O(logn) time.[1] To do this, the rows alternate between min heap and max heap. The algorithms are roughly the same, but, in each step, one must consider the alternating rows with alternating comparisons. The performance is roughly the same as a normal single direction heap. This idea can be generalised to a min-max-median heap.

Adding to the heap

If we have a heap, and we add an element, we can perform an operation known as up-heap, bubble-up, percolate-up, sift-up, or heapify-up in order to restore the heap property. We can do this in O(log n) time, using a binary heap, by following this algorithm:
1. Add the element on the bottom level of the heap. 2. Compare the added element with its parent; if they are in the correct order, stop. 3. If not, swap the element with its parent and return to the previous step.

We do this at maximum for each level in the treethe height of the tree, which is O(log n). However, since approximately 50% of the elements are leaves and 75% are in the bottom two levels, it is likely that the new element to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in average constant time, O(1). Say we have a max-heap

and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However, the heap property is violated since 15 is greater than 8, so we need to swap the 15 and the 8. So, we have the heap looking as follows after the first swap:

However the heap property is still violated since 15 is greater than 11, so we need to swap again:

which is a valid max-heap. There is no need to check the children after this. Before we placed 15 on X, the heap was valid, meaning 11 is greater than 5. If 15 is greater than 11, and 11 is greater than 5, then 15 must be greater than 5.
[edit] Deleting the root from the heap

The procedure for deleting the root from the heapeffectively extracting the maximum element in a max-heap or the minimum element in a min-heapstarts by replacing it with the last element on the last level. So, if we have the same max-heap as before, we remove the 11 and replace it with the 4.

Now the heap property is violated since 8 is greater than 4. The operation that restores the property is called downheap, bubble-down, percolate-down, sift-down, or heapify-down. In this case, swapping the two elements 4 and 8, is enough to restore the heap property and we need not swap elements further:

The downward-moving node is swapped with the larger of its children in a max-heap (in a min-heap it would be swapped with its smaller child), until it satisfies the heap property in its new position. This functionality is achieved by the Max-Heapify function as defined below in pseudocode for an array-backed heap A. Note that "A" is indexed starting at 1, not 0 as is common in many programming languages. Max-Heapify[2](A, i): left 2i

right 2i + 1 largest i if left heap-length[A] and A[left] > A[i] then: largest left if right heap-length[A] and A[right] > A[largest] then: largest right if largest i then: swap A[i] A[largest] Max-Heapify(A, largest) Note that the down-heap operation (without the preceding swap) can be used in general to modify the value of the root, even when an element is not being deleted.

Potrebbero piacerti anche