Sei sulla pagina 1di 26

ADVANCE DATA STRUCTURES

TRIES

TRIES

The name comes from the word retrieval, pronounced trys It can be used to implement a dictionary S on keys which are string on an alphabet.

TYPES OF TRIES

Standard Trie Compressed Trie


MIT 01: ADVANCE DATA STRUCTURES

ADVANCE DATA STRUCTURES


THE STANDARD TRIE

STANDARD TRIE

A standard trie for S is an ordered tree T with the following properties: The trie is composed of S set of s strings from an alphabet such that no string in S is a prefix of another string. Each node of T except the root s labeled with a character of The children of the node are alphabetically ordered The paths from the external nodes to the root yield the strings of S Internal nodes have anywhere between 1 and d children. The convention is to have a special marker character in the alphabet.
MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE

A standard trie uses O(n) space and supports searches, insertions and deletions in time O(dm), where:

Insertion and deletion time is proportional to the length of the largest string in S every internal node of T has at most d children T has s external nodes The height of T is equal to the length of the longest string in S
MIT 01: ADVANCE DATA STRUCTURES

n total size of the strings in S m size of the string parameter of the operation d size of the alphabet

STANDARD TRIE
A standard trie for the strings of S = {bear, bell, bid bull, buy sell, stock, stop}

b e a r
null

s u l l
null

i l l
null

e y
null

t o c k
null

d
null

l l
null

p
null

MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE

Insert Sequence: Insert bear, bell, bid, bull, buy, sell, stock, stop

b e a r
null

s u l l
null

i l l
null

e y
null

t o c k
null

d
null

l l
null

p
null

MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE
Searching of Strings To search a trie for an element with a given key, we start at the root and follow a path down the trie until we either fall off the trie. The path we follow is determined by the element of the search key. Consider the trie (next slide). Suppose we are to search for an element with key bull. We use the first letter, b, in the key to move from the root node to one of its children. Having found a match in one of its children, we use the next letter, u, of the key to move further down the trie. The node we reach so far is b. To move to the next level of the trie, we use the next letter, u, of the key and once again look for a match in one of node bs children. This move gets us to the node u. The process is repeated until all the letters in the search key is exhausted or a no match is found in one of the letters in the search key. If the search key exactly matched a string in the trie, then the string exists. Otherwise, the string doesnt.

MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE

Search Sequence: Search Stock


b s u l l
null

e a r
null

i l l
null

e y
null

t o c k
null

d
null

l l
null

p
null

MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE
Deletion of Strings STEP 1: Search for the string if it exists by comparing each element of the search string starting from the root node. STEP 2: If it exists:

STEP 3: Delete the terminating marker of the string STEP 4: Check the last node searched if it is an internal/parent node

If yes:

Stop

If not:

Delete last node Set the pointer to the parent of the deleted node to be the last node. Repeat STEP 4

10

MIT 01: ADVANCE DATA STRUCTURES

STANDARD TRIE

Delete Sequence: Delete bull


b s u l l
null

e a r
null

i l l
null

e y
null

t o c k
null

d
null

l l
null

p
null

11

MIT 01: ADVANCE DATA STRUCTURES

ADVANCE DATA STRUCTURES


THE COMPRESSED TRIE

12

COMPRESSED TRIE
Compressed/Patricia/Compact Tries A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of redundant nodes

13

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE
A compressed trie for the strings of S = {bear, bell, bid, bull, buy, sell, stock, stop}

id

ell

to

ar

ll

null

ll

null

ck

null

null

null

null

null

null

14

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE

Insert Sequence: stock, sell, bear, bell, bid, buy, stop, bull

id

ell

to

ar

ll

null

ll

null

ck

null

null

null

null

null

null

15

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE
Searching of Strings If an internal node has index i and the ith character of the search key is the jth possible symbol, follow the jth pointer from that node Upon reaching an external node, compare with the string stored. (This step is necessary; the algorithm optimistically assumes that matches have happened at all internal nodes. E.g. stick would be confused with stock) If the search key has fewer than i characters, it is not in the trie
16

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE
Deletion of Strings STEP 1: Search for the string if it exists by comparing each element of the search string starting from the root node. STEP 2: If it exists:

STEP 3: Delete the terminating marker of the string STEP 4: Check the last node searched if it is an internal/parent node

If yes:

If not:

Stop Delete last node Set the pointer to the parent of the deleted node to be the last node. Check the node if it has at least two siblings If not Compress the trie Repeat STEP 4

17

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE

Delete Sequence: Delete bell

id

ell

to

ar

ll

null

ll

null

ck

null

null

null

null

null

null

18

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE

ear

id

ell

to

null

null

ll

null

ck

null

null

null

null

19

MIT 01: ADVANCE DATA STRUCTURES

COMPRESSED TRIE
Compact Representation of a Compressed trie Compact representation of a compressed trie for an array of strings:

Stores at the nodes ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure
MIT 01: ADVANCE DATA STRUCTURES

20

COMPRESSED TRIE
0 1 2 3 4 S[0] = S[1] = S[2] = S[3] = 0 1 2 3 S[4] = S[5] = S[6] = 0 1 2 3 S[7] = S[8] = S[9] =

s e e b e a r s e l l s t o c k

b u l l b u y b i d

h e a r b e l l s t o p

1,0,0

7,0,3

0,0,0

null 1,1,1 6,1,2 4,1,1 0,1,1 3,1,2

1,2,3

8,2,3

null

4,2,3

5,2,2

0,2,2

2,2,3

3,3,4

9,3,3

null

null

null

null

null

null

null

null

21

MIT 01: ADVANCE DATA STRUCTURES

ADVANCE DATA STRUCTURES


APPLICATION OF TRIE

22

TRIES

APPLICATION OF TRIES Text Searching Command completion Supporting word matches internet routing Basic computer data structure, etc.

23

MIT 01: ADVANCE DATA STRUCTURES

APPLICATION OF TRIE
Word Matching in a trie Text to be searched Standard trie for the words in the text
s e e s e e b i d h e a r a a b e a r ? b u l l ? s e l b u y b i d l ? l s t o c k ! s t o c k ! s t o c k ! s t o p !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

s t o c k ! t h e

47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

b e l

69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

Note: Articles and prepositions , which are also known as stop words are excluded.
b e i l l
null

h u l l
null

s e e
null

e y
null

a
r
null

d
null

a
r
null

l
l
null

o
c k
null

47, 58

36

0, 24

p
null

78

30

69

12

84

24

MIT 01: ADVANCE DATA 51, STRUCTURES 62

17, 40

APPLICATION OF TRIE
Tries and Search Engine The index of a search engine (collection of all searchable words) is stored into a compressed trie Each leaf of the trie is associated with a word and has a list of pages (URLs) containing that word, called occurrence list The trie is kept in internal memory The occurrence lists are kept in external memory and are ranked by relevance Boolean queries for sets of words (e.g., Java and coffee) correspond to set operations (e.g., intersection) on the occurrence lists Additional information retrieval techniques are used, such as stopword elimination (e.g., ignore the a is) stemming (e.g., identify add adding added) link analysis (recognize authoritative pages)

25

MIT 01: ADVANCE DATA STRUCTURES

APPLICATION OF TRIE
Tries and Internet Routers Computers on the internet (hosts) are identified by a unique 32-bit IP (internet protocol) address, usually written in dotted-quad-decimal notation E.g., www.cs.brown.edu is 128.148.32.110 Use nslookup on Unix to find out IP addresses An organization uses a subset of IP addresses with the same prefix, e.g., Brown uses 128.148.*.*, Yale uses 130.132.*.* Data is sent to a host by fragmenting it into packets. Each packet carries the IP address of its destination. The internet whose nodes are routers, and whose edges are communication links. A router forwards packets to its neighbors using IP prefix matching rules. E.g., a packet with IP prefix 128.148. should be forwarded to the Brown gateway router. Routers use tries on the alphabet 0,1 to do prefix matching.
26

MIT 01: ADVANCE DATA STRUCTURES

Potrebbero piacerti anche