Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Trees
A tree imposes a hierarchical structure and there are many situations where trees are used.
Organizational and genealogical charts are some examples of trees.
Other definitions
– Degree: number of children (or subtrees) of a node
– A node of degree zero is called a leaf or terminal node. All other nodes are nonterminals.
– Children of the same parent are called siblings
– A path from node n1 to nk is defined as a sequence of nodes n1, n2, …, nk such that ni is the
parent of ni+1 for 1≤i<k.
– The length of a path is one less than the number of nodes in the path.
– There is a path of length zero from a node to itself
– If there is a path from a node a to a node b then a is an ancestor of b and b is a descendent
of a.
– An ancestor (descendent) of a node, other than the node itself, is a proper ancestor (proper
descendent).
– The height of a node is the length of the longest path from the node to a leaf
– The height of a tree is the height of the root.
– The depth of a node is the length of the unique path from the root to the node
– The depth of the root is zero.
– The level of a node is the depth of the node plus 1.
– The children of a tree are ordered left-to-right.
In this example, A is the root of the tree. These are the subtrees with roots …..
The left-to-right ordering of nodes can be used to compare two nodes that are not related by the
ancestor-descendant relationship. The rule is that if a and b are siblings, and a is to the left of b, then all
the descendants of a are to the left of all the descendents of b.
Binary tree
A binary tree is a set of nodes that is either empty or consists of a root and two disjoint binary trees
called the left subtree and the right subtree.
Any node can have at most two children. The following two binary trees are different. The first one has
an empty right subtree and the second one has an empty left subtree.
The following tree is a skewed tree. Left skewed and similarly right skewed.
Let m be the total number of nodes, k be the nodes of degree 2, and let B be the total number of
branches.
m = n + k . The total number of nodes is the total number of leaves plus the total number of nodes of
degree 2
m = B + 1 Each node has exactly one incoming branch and the root has no incoming branch
B = 2k. Since all branches emanate from nodes of degree 2
Or m = 2k + 1
Substituting in the first equation, we get
2k + 1 = n + k or
n=k+1 number of leaves is the number of nonterminals plus one
m = 2k+ 1 number of nodes is twice the number of nonterminals plus one
• Complete binary tree: A binary tree having the maximum number of nodes at each level.
– All nonterminal nodes have two children
– All the leaves are at the same level
• We can number the nodes of a complete binary tree starting with the root and then move level by
level, left to right.
• A binary tree with n nodes and depth k is almost complete iff its nodes correspond to the nodes
numbered 1 to n in the complete binary tree of depth k.
Representation
Array based
An almost complete binary tree can be represented easily in a one dimensional array of an appropriate
type, let us call that treeArray, with node numbered i stored in treearray[i].
You can use this representation for all binary trees, but then you will be left with a lot of unutilized
space.
Struct treenode {
Struct treenode *leftchild;
Int data;
Struct treenode *rightchild;
};
Show how the representation will look like. If the numbers represent data at the nodes as well.
Traversal
Let us now consider some operations on binary trees. The first is traversal, i.e, visit each node in the
tree.
When traversing, we need to treat each node and its subtrees in the same way.
There are three tasks of interest that are performed for each node
1. Visit the node. We will take this to mean access the data and print it. We will label this task D.
2. Traverse the left subtree of the node, L
3. Traverse the right subtree of the node, R
There are six possible combinations. If we adopt the convention that we always traverse the left subtree
before we traverse the right subtree then we have
DLR – preorder
LDR – inorder
LRD – postorder
The basic structure of the traversal code occurs in many recursive tree algorithms. The elements of the
structure are:
1. deal with the base case where the tree is empty,
2. deal with the current node, and then
3. use recursion to deal with the subtrees.
In the case of traversal, the order in which we deal with the current node and the subtrees defines a
particular type of traversal.
Now instead of looking at the other operations on binary trees we will look at special cases of binary
trees and see the operations in their context. Let us begin with the binary search tree.
Example. The labels stand for the values at the nodes and are not names. Such trees are also called
labeled trees.
We will assume that nodes store integers and the stored integer is also the key.
Basically, binary search trees are fast at search and insert. We will see how?
Although it is convenient to write recursive implementations for search and insert, we will first see the
iterative implementations and then look at the recursive implementations
Assuming a pointer based representation, with the same node structure as described, let us first look at
search and its implementation.
What is search? Determine whether a "target" key value is present in a given a binary search tree.
We will extend this to return a pointer to the node with target data if a node with the target data is
found in the search tree. Returns NULL otherwise.
What is the search algorithm? Let us work it out on this example. Assume that the target value is 5. We
start with the root. Call this the current search node. If the current node has the target value, then we
have found the target and we stop the search. If not, then we compare the current node key with the
target, if the target is less than the current node key, then we move to the left subtree. The left child
now becomes the current node. If on the other hand the key is greater than the current node key then
search moves to the right subtree. The right child becomes the current node.
What if the target key value is not present in the tree. In this case search will still follow a path, but at
some node search will move to an empty subtree which could be the left or the right subtree. What this
means in an implementation is that if we designate a pointer to mark the current search node, then this
pointer will be set to NULL.
Iterative implementation
curnode = root;
While (curnode != NULL && !found) {
If (target < curnode->data)
curnode = curnode->leftchild;
Else
If(target > curnode->data)
curnode = curnode->rchild;
Else
found = true;
}
return curnode;
}
Recursive implementation
Remember the basic structure of the recursive code for tree algorithms: deal with the base case where
the tree is empty, deal with the current node, and then use recursion to deal with the subtrees.
/*
Recurses down the tree, chooses the left or right branch by comparing the target to each node.
*/
struct treenode *search(struct treenode* curroot, int target) {
// 1. Base case == empty tree
// in that case, the target is not found so return false
if (curroot == NULL) {
return(NULL);
}
else {
// 2. see if found at the node pointed to by curroot
if (target == curroot->data)
return(curroot);
else {
// 3. otherwise recurse down the correct subtree
if (target < curroot->data)
return(search(curroot->leftchild, target));
else
return(search(node->rightchild, target))
}
}
}
Points:
What about the running time of search? In the case of search we follow a path comparing the target
with the key value at each node. The worst that can happen is that the target is not in the tree and we
end up taking the longest path in the tree. It is evident that the time spent at each node is a constant,
thus the worst case running time is O(h) where h is the height of the tree.
8
5 9
2 6 10
5
2 8
6 9
10
2 5 6 8 9 10
The height of the tree increases from on tree to the next. In the case of the skewed tree the number of
levels is equal to the number of nodes. We can easily see that worst case search time can be minimized
if the nodes in the tree make an almost complete binary tree.
In general, an n node skewed tree is a linear chain of n nodes. For this tree, the running time of search is
O(n). However in the case of the complete binary tree it is O(lg n) where lg is log to the base 2. It turns
out that the average running time of search is O(lg n).
In the case of a BST, an inorder traversal produces a sorted output, i.e., the keys are output in sorted
order. This is apparent from the fact that in an inorder traversal, the key of the root of a tree is output in
between the keys in its left subtree and the keys in the right subtree.
What is the running time of the traversal? Traversal has to go through all the nodes in the tree and takes
O(n) time. This can be proved by formulating a recurrence and then solving it. Just to get an idea of the
recurrence relation, if inorder is called on a tree with root x having n nodes, n > 0, and the root x has k
nodes in the left subtree and n-k-1 nodes in the right subtree, then the running time of inorder(x) is T(n)
= T(k) + T(n-k-1) + d, where represents the time spent in activities other than the recursive calls.
Insert Operation
We can use the search algorithm to insert a key x into a BST. When we search for a target key and the
key is not present then search terminates when we reach an empty subtree. This subtree is where we
expect the target to be present. We thus insert a new node with the key x as the root of this subtree.
Example
We make use of two pointers now. One pointer, call that t, is used to trace the path and the other, call
this tp, always points to the parent of the node pointed to by t, i.e., tp trails t. This is necessary so that
we can keep track of the node at which the new node is to be inserted as an appropriate child.
Example:
Struct treenode * insert(struct treenode* root, int x) {
struct treenode *t, *tp, *new;
Bool found = false; // we do not insert if we find the key to insert is already present
t = root;
tp = NULL;
If(!found) {
// allocate new
New->data = x;
New->left = new->right = NULL;
}
Return root;
What is the running time of insert? Clearly it is O(h) where h is the height of the tree.
With insert, we can see that the order in which the data arrives for insertion will determine the shape of
the tree. Obviously, the data arrived in sorted sequence in case of the degenerate tree.
5
2 7
6 9
8 12
11
10
In this case the successor is the leftmost node in the right subtree and this can be found using the
minimum function on the right child of x. For example, the inorder successor of 7 is 8, which is the
leftmost in the right subtree of 7.
2. The right subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x
whose left child is also an ancestor of x. here we must remember that every node is an ancestor
of itself.
15
5 20
3 7
10
9
Here what is the ancestor of 10. We trace back and find that the lowest ancestor of x is 15 whose left
child 5 is also an ancestor of 10. Remember that the ancestor-descendant relationship is defined over
paths between nodes. No paths no relationship.
You can see that if at the node in question does not have a right child, then in the inorder traversal we
have to backtrack. We will backtrack to an ancestor from where we can descend to the right subtree. So
obviously this is the closest ancestor node whose left subtree has now been traversed and of which the
node was a part.
Deletion:
20
15 31
10 16 35
5 12 17 25 33 40
34 36
37
if( T != NULL )
if( target < T->data ) /* Go left */
T->leftchild = delete(T->leftchild , target);
else
if( target > T->Data ) /* Go right */
T->rightchild = delete(T->rightchild, target );
else /* Found element to be deleted */
if( T->leftchild && T->rightchild ) /* Two children */
{
/* Replace with smallest in right subtree */
tmp = minimum( T->rightchild );
T->data = tmp->data;
T->rightchild = delete(T->rightchild, T->data );
}
else /* One or zero children */
{
tmp = T;
if( T->leftchild == NULL ) /* Also handles 0 children */
T = T->rightchild;
else if( T->rightchild == NULL )
T = T->leftchild;
free( tmp );
}
return T;
}
From our study of the binary search tree we know now that we can minimize the worst case search time
if we can maintain the tree as an almost complete binary tree.
However when we are dealing with a dynamic situation in which search, insert and delete operations
are interleaved, frequent restructuring may be required if the tree has to be maintained as an almost
complete binary tree and this may increase the running time of insert and delete considerably.
It is however possible to keep the binary search tree balanced so that the average and the worst case
running time of search is O(log2n).
• Named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their
1962 paper "An algorithm for the organization of information.“
• An AVL tree is a binary search tree in which the heights of the two child subtrees of any node
differ by at most one. Such a tree is said to be height balanced.
• The balance factor of a node is the height of its left subtree minus the height of its right subtree
– A node with balance factor 1, 0, or -1 is considered balanced.
• Is the balance condition sufficient to guarantee that the height of an AVL tree with n nodes will
be O(log2n)
By convention, the value -1 corresponds to the height of a subtree with no nodes, whereas zero corresponds to the
height of a subtree with one node
Insert Operation
When we insert a new node is inserted it may affect the balance factors of the nodes along the path
from the root to that node AND only these nodes may be affected because only these nodes have their
subtrees altered. Hence the balance factors of all the nodes must be recomputed and if necessary, the
tree must be rebalanced.
Rebalancing is carried out at the nearest ancestor of the newly inserted node whose balance factor has
changed to ±2. In order for a node’s balance factor to become ±2, the balance factor must have been ±1
before the insertion. Since this is the nearest ancestor whose balance factor becomes ±2 it is also the
nearest ancestor whose balance factor was ±1 before the insertion. This means that before the
insertion, the balance factor of all the other nodes on the path from this ancestor to the new insertion
point must have been zero.
Tree rotation
A tree rotation is an operation on a binary tree that changes the structure without affecting the binary search tree
property.
The right operation is rooted at Q. Root for the parent node of the subtrees to rotate, Pivot for the node which
will become the new parent node
Case I – LL Rotation
There are three important properties of the LL rotation:
1. The rotation does not destroy the data ordering property so the result is still a valid search tree.
Subtree remains to the left of node A, subtree remains between nodes A and B, and
subtree remains to the right of node B.
2. Both nodes A and B end up with zero balance factors.
3. After the rotation, the tree has the same height it had originally. Inserting the item did not
increase the overall height of the tree
Case 2: LR rotation
The tree can be restored by performing an RR rotation rooted at node A, followed by an LL rotation at
node C. The LL and RR rotations are called single rotations. The combination of the two single rotations
is called a double rotation and is given the name LR rotation
Insert Coding
insert is a recursive function and takes two arguments one the address of the root of the tree in which
to insert and the second one is the key to be inserted. T will point to the root of the tree in which the
insertion has to take place. We refer to this node as the current node. Since after an insert, the root of
the tree may change the function returns the address of the root of the tree so as to ensure that
changes in subtree roots can be reflected in the child pointers of the parent.
The base case is an insertion into an empty tree. A node is allocated and the fields are set appropriately.
The child pointers and the height is set to indicate that the new node is a leaf node. The address of this
node is returned to set the child pointer of the parent appropriately. Calls to insert are embedded in an
assignment so as to ensure that this happens.
In case insert is called on a non-empty tree and T is valid then we compare the key to be inserted with
the key at the current node. If the key is less than the key at the current node, we make a call to insert
to insert the key in the left subtree. After the insertion has taken place we check if the difference in the
height of the left subtree and the right subtree is 2. If that is the case, then we check where exactly the
insertion has taken place so as to carry out an appropriate rotation. If the insertion has taken place in
the left subtree of the left child which means that the inserted key is less than the key at the left child of
the current node, an LL rotation is carried out. Otherwise a LR rotation is carried out.
To the rotation functions we pass the address of the current node since the rotation will be rooted at
this node. The rotation functions return the address of the new root, i.e., the pivot node. T is set to point
to this node. The height of this node is computed from the heights of the child nodes and the address of
this node is returned so that the parent's child pointer is set to point to the correct node. In fact the
placement of this statement ensures that before we return we recompute the height of the current
node from the heights of its child nodes.
Recursion also ensures that we rebalance at the closest ancestor of the newly inserted node, because
when we return from the recursive calls, we go up the tree.
let N(h) denote the minimum number of nodes that can be in an AVL tree of height h.
We try to generate a recurrence for N(h)
Clearly N(0) = 1 and N(1) = 2
In general
N(h) = 1 (for the root) + N(height of left subtree) + N(height of right subtree)
Since the overall tree has height h, one of the subtrees must have height h − 1, suppose hL. To make the
other subtree as small as possible we minimize its height. It's height can be no smaller than h − 2
without violating the AVL condition.
------------------
This recurrence looks very similar to the Fibonacci recurrence (F(h) = F(h−1) + F(h − 2)). In fact, it can be
argued (by a little approximating, a little cheating, and a little constructive induction) that N(h) is roughly
The quantity is the famous Golden ratio denote this by this symbol φ. Thus, by
inverting this we get that the height of the worst case AVL tree with n nodes is roughly logφ n, where φ is
the Golden ratio. This is O(log2 n) (because log's of different bases differ only by a constant factor).
Deletion
If you look at the binary search tree deletion carefully, we will notice that in the end in all the cases, we
end up deleting a node which is either a leaf or a node with a single child. This includes the case where
we want to delete a node with a left as well as a right child. Here we actually end up deleting the inorder
successor, which is either a leaf or a node with a single child.
For deleted leaf nodes, clearly the heights of the children of the node do not change. Also, the heights of
the children of a deleted node with one child do not change either. Thus, if a delete causes a violation of
the AVL Tree height property, this would HAVE to occur on some node on the path from root to the
parent of the deleted node.
Thus starting with the parent, we will have to carry out a rebalancing operation at each node whose
balance factor has changed to +- 2. One thing to note is that in an insert there is at most one node that
needs to be rebalanced, there may be multiple nodes in the delete that need to be rebalanced. This also
means that deletions lead to O(log2n) rotations in the worst case.
Suppose that we have deleted a key from the left subtree and that as a result this subtree's height has
decreased by one, and this has caused some ancestor to violate the height balance condition. There are
two cases. First, if balance factor for the right child is either 0 or -1 (that is, it is not left heavy), we
perform a single left (RR) rotation .
On the other hand, if the right child has a balance factor of +1 (it is left heavy) then we need to perform
a double rotation,