Sei sulla pagina 1di 18

These notes are not complete.

Trees

A tree imposes a hierarchical structure and there are many situations where trees are used.
Organizational and genealogical charts are some examples of trees.

• A tree is a finite set of nodes that is:


– either empty or
– it contains one or more nodes such that (i) there is a specially designated node called the
root; (ii) the remaining nodes are partitioned into n ≥ 0 disjoint sets T1, …, Tn each of which is
a tree. T1, …, Tn are also called the subtrees of the root.
– If nodes r1, …, rn are the roots of the subtrees T1, …, Tn of the root r, then r1, …, rn are the
children of root r and r is the parent of the nodes r1, …, rn

Other definitions
– Degree: number of children (or subtrees) of a node
– A node of degree zero is called a leaf or terminal node. All other nodes are nonterminals.
– Children of the same parent are called siblings
– A path from node n1 to nk is defined as a sequence of nodes n1, n2, …, nk such that ni is the
parent of ni+1 for 1≤i<k.
– The length of a path is one less than the number of nodes in the path.
– There is a path of length zero from a node to itself
– If there is a path from a node a to a node b then a is an ancestor of b and b is a descendent
of a.
– An ancestor (descendent) of a node, other than the node itself, is a proper ancestor (proper
descendent).
– The height of a node is the length of the longest path from the node to a leaf
– The height of a tree is the height of the root.
– The depth of a node is the length of the unique path from the root to the node
– The depth of the root is zero.
– The level of a node is the depth of the node plus 1.
– The children of a tree are ordered left-to-right.

A node can be of any type.


Let us look at a conventional way of picturing a tree. The nodes are drawn as circles and the parent—
child relationship is represented by a line which is also called an edge or branch. Let us name the nodes.

In this example, A is the root of the tree. These are the subtrees with roots …..

The absence of a branch indicates the empty subtree.

The left-to-right ordering of nodes can be used to compare two nodes that are not related by the
ancestor-descendant relationship. The rule is that if a and b are siblings, and a is to the left of b, then all
the descendants of a are to the left of all the descendents of b.
Binary tree

A binary tree is a set of nodes that is either empty or consists of a root and two disjoint binary trees
called the left subtree and the right subtree.

Here is an example of a binary tree.

Any node can have at most two children. The following two binary trees are different. The first one has
an empty right subtree and the second one has an empty left subtree.

The following tree is a skewed tree. Left skewed and similarly right skewed.

(i) The maximum number of nodes on level i of a binary tree is 2i-1.


The max number of nodes at level 1 is 1. At level 2 it is 2 as the root can have atmost
two children. Since each node at level I can have atmost two children, the maximum
number of nodes at level i+1 is 2i-1 x 2 = 2i
(ii) The maximum number of nodes in a binary tree with k levels is 2k-1 , k ≥ 1.
The max number of nodes is Σ I =1 to k (max number of nodes at level i) which we can
prove by induction is …

A strict binary tree with n leaves has 2n-1 nodes

Let m be the total number of nodes, k be the nodes of degree 2, and let B be the total number of
branches.
m = n + k . The total number of nodes is the total number of leaves plus the total number of nodes of
degree 2
m = B + 1 Each node has exactly one incoming branch and the root has no incoming branch
B = 2k. Since all branches emanate from nodes of degree 2
Or m = 2k + 1
Substituting in the first equation, we get
2k + 1 = n + k or
n=k+1 number of leaves is the number of nonterminals plus one
m = 2k+ 1 number of nodes is twice the number of nonterminals plus one

Since k = n-1 substituting in equation 1, we have


m = 2n -1 number of nodes is what we wanted to prove.

• Complete binary tree: A binary tree having the maximum number of nodes at each level.
– All nonterminal nodes have two children
– All the leaves are at the same level
• We can number the nodes of a complete binary tree starting with the root and then move level by
level, left to right.
• A binary tree with n nodes and depth k is almost complete iff its nodes correspond to the nodes
numbered 1 to n in the complete binary tree of depth k.
Representation

Array based

An almost complete binary tree can be represented easily in a one dimensional array of an appropriate
type, let us call that treeArray, with node numbered i stored in treearray[i].

For any index i, 1 ≤ i ≤ n, we have


1. Parent of node i is at floor(i/2) if i≠1. When i=1, then I is the root and has no parent.
2. Left child of node i is at 2i if 2i≤n. If 2i > n then i has no left child
3. Right child of node i is at 2i+1 if 2i+1≤n. If 2i+1 > n then i has no right child

You can use this representation for all binary trees, but then you will be left with a lot of unutilized
space.

Let us try to analyse the wastage.


What is the size of the array that we need to store a skewed tree of depth k?
How many will be used up? K
Rest is all wasted.

Pointer based representation

Struct treenode {
Struct treenode *leftchild;
Int data;
Struct treenode *rightchild;
};

Show how the representation will look like. If the numbers represent data at the nodes as well.

Empty tree is represented by the NULL external pointer.

Traversal

Let us now consider some operations on binary trees. The first is traversal, i.e, visit each node in the
tree.

When traversing, we need to treat each node and its subtrees in the same way.
There are three tasks of interest that are performed for each node
1. Visit the node. We will take this to mean access the data and print it. We will label this task D.
2. Traverse the left subtree of the node, L
3. Traverse the right subtree of the node, R

There are six possible combinations. If we adopt the convention that we always traverse the left subtree
before we traverse the right subtree then we have
DLR – preorder
LDR – inorder
LRD – postorder

Take example and show output.

The basic structure of the traversal code occurs in many recursive tree algorithms. The elements of the
structure are:
1. deal with the base case where the tree is empty,
2. deal with the current node, and then
3. use recursion to deal with the subtrees.

In the case of traversal, the order in which we deal with the current node and the subtrees defines a
particular type of traversal.

Void preorder(struct treenode *t) { // if t is pointer to a given node


If(t != NULL) {
Printf(“%d”, t->data);
Preorder(t->leftchild);
Preorder(t->rightchild);
}
}

Now instead of looking at the other operations on binary trees we will look at special cases of binary
trees and see the operations in their context. Let us begin with the binary search tree.

Binary search trees

Example. The labels stand for the values at the nodes and are not names. Such trees are also called
labeled trees.

We will assume that nodes store integers and the stored integer is also the key.

Show an example of a BST.

Basically, binary search trees are fast at search and insert. We will see how?

Although it is convenient to write recursive implementations for search and insert, we will first see the
iterative implementations and then look at the recursive implementations

Assuming a pointer based representation, with the same node structure as described, let us first look at
search and its implementation.

What is search? Determine whether a "target" key value is present in a given a binary search tree.

We will extend this to return a pointer to the node with target data if a node with the target data is
found in the search tree. Returns NULL otherwise.
What is the search algorithm? Let us work it out on this example. Assume that the target value is 5. We
start with the root. Call this the current search node. If the current node has the target value, then we
have found the target and we stop the search. If not, then we compare the current node key with the
target, if the target is less than the current node key, then we move to the left subtree. The left child
now becomes the current node. If on the other hand the key is greater than the current node key then
search moves to the right subtree. The right child becomes the current node.

Search follows a path. In this example the path is …

What if the target key value is not present in the tree. In this case search will still follow a path, but at
some node search will move to an empty subtree which could be the left or the right subtree. What this
means in an implementation is that if we designate a pointer to mark the current search node, then this
pointer will be set to NULL.

Iterative implementation

struct treenode *search(struct treenode* root, int target) {


struct treenode *curnode; //
Bool found = false;

curnode = root;
While (curnode != NULL && !found) {
If (target < curnode->data)
curnode = curnode->leftchild;
Else
If(target > curnode->data)
curnode = curnode->rchild;
Else
found = true;
}
return curnode;
}

Recursive implementation

Remember the basic structure of the recursive code for tree algorithms: deal with the base case where
the tree is empty, deal with the current node, and then use recursion to deal with the subtrees.

/*
Recurses down the tree, chooses the left or right branch by comparing the target to each node.
*/
struct treenode *search(struct treenode* curroot, int target) {
// 1. Base case == empty tree
// in that case, the target is not found so return false
if (curroot == NULL) {
return(NULL);
}
else {
// 2. see if found at the node pointed to by curroot
if (target == curroot->data)
return(curroot);
else {
// 3. otherwise recurse down the correct subtree
if (target < curroot->data)
return(search(curroot->leftchild, target));
else
return(search(node->rightchild, target))
}
}
}

Points:

The elements of a BST are distinct

What about the running time of search? In the case of search we follow a path comparing the target
with the key value at each node. The worst that can happen is that the target is not in the tree and we
end up taking the longest path in the tree. It is evident that the time spent at each node is a constant,
thus the worst case running time is O(h) where h is the height of the tree.

Different binary trees can represent the same set of values

8
5 9
2 6 10

5
2 8
6 9
10

2 5 6 8 9 10

The height of the tree increases from on tree to the next. In the case of the skewed tree the number of
levels is equal to the number of nodes. We can easily see that worst case search time can be minimized
if the nodes in the tree make an almost complete binary tree.
In general, an n node skewed tree is a linear chain of n nodes. For this tree, the running time of search is
O(n). However in the case of the complete binary tree it is O(lg n) where lg is log to the base 2. It turns
out that the average running time of search is O(lg n).

Let us come back to the traversal

In the case of a BST, an inorder traversal produces a sorted output, i.e., the keys are output in sorted
order. This is apparent from the fact that in an inorder traversal, the key of the root of a tree is output in
between the keys in its left subtree and the keys in the right subtree.

What is the running time of the traversal? Traversal has to go through all the nodes in the tree and takes
O(n) time. This can be proved by formulating a recurrence and then solving it. Just to get an idea of the
recurrence relation, if inorder is called on a tree with root x having n nodes, n > 0, and the root x has k
nodes in the left subtree and n-k-1 nodes in the right subtree, then the running time of inorder(x) is T(n)
= T(k) + T(n-k-1) + d, where represents the time spent in activities other than the recursive calls.

Minimum and maximum

How to find the minimum and the maximum key value?


Because of the BST property, the minimum key value can always be found by following the left child
pointers from the root until a NULL left child pointer is encountered. Similarly, the largest can be found
by following the right child pointers until a NULL is encountered.

Does the node always have to be a leaf?

The function to find the smallest has a simple implementation

struct treenode *minimum (struct treenode *t) {


While (t->leftchild != NULL) / follow the left child pointers till the pointer is not NULL
t = t->leftchild;
Return t;
}

Insert Operation

We can use the search algorithm to insert a key x into a BST. When we search for a target key and the
key is not present then search terminates when we reach an empty subtree. This subtree is where we
expect the target to be present. We thus insert a new node with the key x as the root of this subtree.

Example

We make use of two pointers now. One pointer, call that t, is used to trace the path and the other, call
this tp, always points to the parent of the node pointed to by t, i.e., tp trails t. This is necessary so that
we can keep track of the node at which the new node is to be inserted as an appropriate child.

Example:
Struct treenode * insert(struct treenode* root, int x) {
struct treenode *t, *tp, *new;
Bool found = false; // we do not insert if we find the key to insert is already present

t = root;
tp = NULL;

While (t != NULL && !found) {


If (x < t->data) { // then search the left subtree
tp = t;
t = t->leftchild;
}
Else
If(x > t->data) { // search the right subtree
tp = t;
t = t->rchild;
}
Else
found = true;
}

If(!found) {
// allocate new
New->data = x;
New->left = new->right = NULL;

If(root == NULL) //insert into the empty tree


return new;
else if (x < tp->data)
tp->leftchild = new;
else
tp->rightchild = new;

}
Return root;

What is the running time of insert? Clearly it is O(h) where h is the height of the tree.

With insert, we can see that the order in which the data arrives for insertion will determine the shape of
the tree. Obviously, the data arrived in sorted sequence in case of the degenerate tree.

Determining the inorder successor.


In many situations it is important to find the inorder successor of a given node. Since the keys are all
distinct, the successor of a node is the node with the smallest key greater than the key at the node.

There are two cases:


1. The right subtree is not empty.

5
2 7
6 9
8 12
11
10

In this case the successor is the leftmost node in the right subtree and this can be found using the
minimum function on the right child of x. For example, the inorder successor of 7 is 8, which is the
leftmost in the right subtree of 7.

2. The right subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x
whose left child is also an ancestor of x. here we must remember that every node is an ancestor
of itself.

Consider 12. The inorder successor of 12 is nothing.


Consider 8: The inorder successor is 9

15
5 20
3 7
10
9

Here what is the ancestor of 10. We trace back and find that the lowest ancestor of x is 15 whose left
child 5 is also an ancestor of 10. Remember that the ancestor-descendant relationship is defined over
paths between nodes. No paths no relationship.

You can see that if at the node in question does not have a right child, then in the inorder traversal we
have to backtrack. We will backtrack to an ancestor from where we can descend to the right subtree. So
obviously this is the closest ancestor node whose left subtree has now been traversed and of which the
node was a part.

We need parent links.

Deletion:

There are 3 cases:


 Case I: Node to be deleted is a leaf node
 Simply delete the node and have the pointer that pointed to it set to NULL
 Case II: Node to be deleted has only one child
 Let the single child replace the parent
 Case III: Node to be deleted has two children
 Find inOrder successor
 Successor replaces node to be deleted

Let us take an example and see all the cases

20
15 31
10 16 35

5 12 17 25 33 40

34 36

37

Delete 5 – delete a leaf


Delete 10 – a node with an empty left subtree. In this case 12 replaces 10. Node with 10 is spliced out
Delete 40 – a node with an empty right subtree. In this case the node with 36 replaces the node.
Delete 31 – find the inorder successor of 31 or we can also say find the node with the minimum key in
the right subtree, 33 in this case, splice the node and replace 31 by 33.
We can also replace the key with the maximum in the left subtree.
Let us look at the code now. The code is recursive, but it shows how neatly we can implement delete

struct treenode* Delete( struct treenode * T, int target)


{ // this node is also referred to as the current node
Struct treenode *tmp;

if( T != NULL )
if( target < T->data ) /* Go left */
T->leftchild = delete(T->leftchild , target);
else
if( target > T->Data ) /* Go right */
T->rightchild = delete(T->rightchild, target );
else /* Found element to be deleted */
if( T->leftchild && T->rightchild ) /* Two children */
{
/* Replace with smallest in right subtree */
tmp = minimum( T->rightchild );
T->data = tmp->data;
T->rightchild = delete(T->rightchild, T->data );
}
else /* One or zero children */
{
tmp = T;
if( T->leftchild == NULL ) /* Also handles 0 children */
T = T->rightchild;
else if( T->rightchild == NULL )
T = T->leftchild;
free( tmp );
}

return T;
}

What is the worst case running time of delete?


O(n) and the average case is O(log2n)
AVL trees

From our study of the binary search tree we know now that we can minimize the worst case search time
if we can maintain the tree as an almost complete binary tree.

However when we are dealing with a dynamic situation in which search, insert and delete operations
are interleaved, frequent restructuring may be required if the tree has to be maintained as an almost
complete binary tree and this may increase the running time of insert and delete considerably.

It is however possible to keep the binary search tree balanced so that the average and the worst case
running time of search is O(log2n).

• Named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their
1962 paper "An algorithm for the organization of information.“
• An AVL tree is a binary search tree in which the heights of the two child subtrees of any node
differ by at most one. Such a tree is said to be height balanced.
• The balance factor of a node is the height of its left subtree minus the height of its right subtree
– A node with balance factor 1, 0, or -1 is considered balanced.
• Is the balance condition sufficient to guarantee that the height of an AVL tree with n nodes will
be O(log2n)

By convention, the value -1 corresponds to the height of a subtree with no nodes, whereas zero corresponds to the
height of a subtree with one node

Show examples with balance factor

Insert Operation

When we insert a new node is inserted it may affect the balance factors of the nodes along the path
from the root to that node AND only these nodes may be affected because only these nodes have their
subtrees altered. Hence the balance factors of all the nodes must be recomputed and if necessary, the
tree must be rebalanced.

Rebalancing is carried out at the nearest ancestor of the newly inserted node whose balance factor has
changed to ±2. In order for a node’s balance factor to become ±2, the balance factor must have been ±1
before the insertion. Since this is the nearest ancestor whose balance factor becomes ±2 it is also the
nearest ancestor whose balance factor was ±1 before the insertion. This means that before the
insertion, the balance factor of all the other nodes on the path from this ancestor to the new insertion
point must have been zero.

Let us call the node at which rebalancing must be carried out, X.


Now we have four cases of insertion to deal with:
1. An insertion into the left subtree of the left child of X.
2. An insertion into the right subtree of the left child of X.
3. An insertion into the left subtree of the right child of X.
4. An insertion into the right subtree of the right child of X.
Cases 1 and 4 are handled by a single rotation operation and 2 and 3 are handled by a double rotation
operation

Tree rotation

A tree rotation is an operation on a binary tree that changes the structure without affecting the binary search tree
property.

The right operation is rooted at Q. Root for the parent node of the subtrees to rotate, Pivot for the node which
will become the new parent node

We name the rotations that we need to carry out

Case I – LL Rotation
There are three important properties of the LL rotation:

1. The rotation does not destroy the data ordering property so the result is still a valid search tree.
Subtree remains to the left of node A, subtree remains between nodes A and B, and
subtree remains to the right of node B.
2. Both nodes A and B end up with zero balance factors.
3. After the rotation, the tree has the same height it had originally. Inserting the item did not
increase the overall height of the tree

Case 2: LR rotation
The tree can be restored by performing an RR rotation rooted at node A, followed by an LL rotation at
node C. The LL and RR rotations are called single rotations. The combination of the two single rotations
is called a double rotation and is given the name LR rotation
Insert Coding

insert is a recursive function and takes two arguments one the address of the root of the tree in which
to insert and the second one is the key to be inserted. T will point to the root of the tree in which the
insertion has to take place. We refer to this node as the current node. Since after an insert, the root of
the tree may change the function returns the address of the root of the tree so as to ensure that
changes in subtree roots can be reflected in the child pointers of the parent.

The base case is an insertion into an empty tree. A node is allocated and the fields are set appropriately.
The child pointers and the height is set to indicate that the new node is a leaf node. The address of this
node is returned to set the child pointer of the parent appropriately. Calls to insert are embedded in an
assignment so as to ensure that this happens.

In case insert is called on a non-empty tree and T is valid then we compare the key to be inserted with
the key at the current node. If the key is less than the key at the current node, we make a call to insert
to insert the key in the left subtree. After the insertion has taken place we check if the difference in the
height of the left subtree and the right subtree is 2. If that is the case, then we check where exactly the
insertion has taken place so as to carry out an appropriate rotation. If the insertion has taken place in
the left subtree of the left child which means that the inserted key is less than the key at the left child of
the current node, an LL rotation is carried out. Otherwise a LR rotation is carried out.

To the rotation functions we pass the address of the current node since the rotation will be rooted at
this node. The rotation functions return the address of the new root, i.e., the pivot node. T is set to point
to this node. The height of this node is computed from the heights of the child nodes and the address of
this node is returned so that the parent's child pointer is set to point to the correct node. In fact the
placement of this statement ensures that before we return we recompute the height of the current
node from the heights of its child nodes.

Recursion also ensures that we rebalance at the closest ancestor of the newly inserted node, because
when we return from the recursive calls, we go up the tree.

The case of insertion into the right subtree is similar.

let N(h) denote the minimum number of nodes that can be in an AVL tree of height h.
We try to generate a recurrence for N(h)
Clearly N(0) = 1 and N(1) = 2
In general
N(h) = 1 (for the root) + N(height of left subtree) + N(height of right subtree)

Since the overall tree has height h, one of the subtrees must have height h − 1, suppose hL. To make the
other subtree as small as possible we minimize its height. It's height can be no smaller than h − 2
without violating the AVL condition.

Thus we have the recurrence


1 when h = 0
2 when h = 1
N(h) = N(h−1) + N(h − 2) + 1 when h >= 2
Adelson and Landis have shown that this formula leads to bounds on h for an avl tree with n nodes
Log2(n+1) <= h < 1.44 log2(n+2) – 0.328
Hence h is bounded by O(log2n) and the worse case search requires O(log2n) comparisons.

------------------
This recurrence looks very similar to the Fibonacci recurrence (F(h) = F(h−1) + F(h − 2)). In fact, it can be
argued (by a little approximating, a little cheating, and a little constructive induction) that N(h) is roughly

The quantity is the famous Golden ratio denote this by this symbol φ. Thus, by
inverting this we get that the height of the worst case AVL tree with n nodes is roughly logφ n, where φ is
the Golden ratio. This is O(log2 n) (because log's of different bases differ only by a constant factor).

Deletion

First we do a normal binary search tree delete.

If you look at the binary search tree deletion carefully, we will notice that in the end in all the cases, we
end up deleting a node which is either a leaf or a node with a single child. This includes the case where
we want to delete a node with a left as well as a right child. Here we actually end up deleting the inorder
successor, which is either a leaf or a node with a single child.

For deleted leaf nodes, clearly the heights of the children of the node do not change. Also, the heights of
the children of a deleted node with one child do not change either. Thus, if a delete causes a violation of
the AVL Tree height property, this would HAVE to occur on some node on the path from root to the
parent of the deleted node.

Thus starting with the parent, we will have to carry out a rebalancing operation at each node whose
balance factor has changed to +- 2. One thing to note is that in an insert there is at most one node that
needs to be rebalanced, there may be multiple nodes in the delete that need to be rebalanced. This also
means that deletions lead to O(log2n) rotations in the worst case.

Let us look at some general cases.

Suppose that we have deleted a key from the left subtree and that as a result this subtree's height has
decreased by one, and this has caused some ancestor to violate the height balance condition. There are
two cases. First, if balance factor for the right child is either 0 or -1 (that is, it is not left heavy), we
perform a single left (RR) rotation .
On the other hand, if the right child has a balance factor of +1 (it is left heavy) then we need to perform
a double rotation,

Now let us take an example

Potrebbero piacerti anche