Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
TRIES
TRIES
The name comes from the word retrieval, pronounced trys It can be used to implement a dictionary S on keys which are string on an alphabet.
TYPES OF TRIES
STANDARD TRIE
A standard trie for S is an ordered tree T with the following properties: The trie is composed of S set of s strings from an alphabet such that no string in S is a prefix of another string. Each node of T except the root s labeled with a character of The children of the node are alphabetically ordered The paths from the external nodes to the root yield the strings of S Internal nodes have anywhere between 1 and d children. The convention is to have a special marker character in the alphabet.
MIT 01: ADVANCE DATA STRUCTURES
STANDARD TRIE
A standard trie uses O(n) space and supports searches, insertions and deletions in time O(dm), where:
Insertion and deletion time is proportional to the length of the largest string in S every internal node of T has at most d children T has s external nodes The height of T is equal to the length of the longest string in S
MIT 01: ADVANCE DATA STRUCTURES
n total size of the strings in S m size of the string parameter of the operation d size of the alphabet
STANDARD TRIE
A standard trie for the strings of S = {bear, bell, bid bull, buy sell, stock, stop}
b e a r
null
s u l l
null
i l l
null
e y
null
t o c k
null
d
null
l l
null
p
null
STANDARD TRIE
Insert Sequence: Insert bear, bell, bid, bull, buy, sell, stock, stop
b e a r
null
s u l l
null
i l l
null
e y
null
t o c k
null
d
null
l l
null
p
null
STANDARD TRIE
Searching of Strings To search a trie for an element with a given key, we start at the root and follow a path down the trie until we either fall off the trie. The path we follow is determined by the element of the search key. Consider the trie (next slide). Suppose we are to search for an element with key bull. We use the first letter, b, in the key to move from the root node to one of its children. Having found a match in one of its children, we use the next letter, u, of the key to move further down the trie. The node we reach so far is b. To move to the next level of the trie, we use the next letter, u, of the key and once again look for a match in one of node bs children. This move gets us to the node u. The process is repeated until all the letters in the search key is exhausted or a no match is found in one of the letters in the search key. If the search key exactly matched a string in the trie, then the string exists. Otherwise, the string doesnt.
STANDARD TRIE
e a r
null
i l l
null
e y
null
t o c k
null
d
null
l l
null
p
null
STANDARD TRIE
Deletion of Strings STEP 1: Search for the string if it exists by comparing each element of the search string starting from the root node. STEP 2: If it exists:
STEP 3: Delete the terminating marker of the string STEP 4: Check the last node searched if it is an internal/parent node
If yes:
Stop
If not:
Delete last node Set the pointer to the parent of the deleted node to be the last node. Repeat STEP 4
10
STANDARD TRIE
e a r
null
i l l
null
e y
null
t o c k
null
d
null
l l
null
p
null
11
12
COMPRESSED TRIE
Compressed/Patricia/Compact Tries A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of redundant nodes
13
COMPRESSED TRIE
A compressed trie for the strings of S = {bear, bell, bid, bull, buy, sell, stock, stop}
id
ell
to
ar
ll
null
ll
null
ck
null
null
null
null
null
null
14
COMPRESSED TRIE
Insert Sequence: stock, sell, bear, bell, bid, buy, stop, bull
id
ell
to
ar
ll
null
ll
null
ck
null
null
null
null
null
null
15
COMPRESSED TRIE
Searching of Strings If an internal node has index i and the ith character of the search key is the jth possible symbol, follow the jth pointer from that node Upon reaching an external node, compare with the string stored. (This step is necessary; the algorithm optimistically assumes that matches have happened at all internal nodes. E.g. stick would be confused with stock) If the search key has fewer than i characters, it is not in the trie
16
COMPRESSED TRIE
Deletion of Strings STEP 1: Search for the string if it exists by comparing each element of the search string starting from the root node. STEP 2: If it exists:
STEP 3: Delete the terminating marker of the string STEP 4: Check the last node searched if it is an internal/parent node
If yes:
If not:
Stop Delete last node Set the pointer to the parent of the deleted node to be the last node. Check the node if it has at least two siblings If not Compress the trie Repeat STEP 4
17
COMPRESSED TRIE
id
ell
to
ar
ll
null
ll
null
ck
null
null
null
null
null
null
18
COMPRESSED TRIE
ear
id
ell
to
null
null
ll
null
ck
null
null
null
null
19
COMPRESSED TRIE
Compact Representation of a Compressed trie Compact representation of a compressed trie for an array of strings:
Stores at the nodes ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure
MIT 01: ADVANCE DATA STRUCTURES
20
COMPRESSED TRIE
0 1 2 3 4 S[0] = S[1] = S[2] = S[3] = 0 1 2 3 S[4] = S[5] = S[6] = 0 1 2 3 S[7] = S[8] = S[9] =
s e e b e a r s e l l s t o c k
b u l l b u y b i d
h e a r b e l l s t o p
1,0,0
7,0,3
0,0,0
1,2,3
8,2,3
null
4,2,3
5,2,2
0,2,2
2,2,3
3,3,4
9,3,3
null
null
null
null
null
null
null
null
21
22
TRIES
APPLICATION OF TRIES Text Searching Command completion Supporting word matches internet routing Basic computer data structure, etc.
23
APPLICATION OF TRIE
Word Matching in a trie Text to be searched Standard trie for the words in the text
s e e s e e b i d h e a r a a b e a r ? b u l l ? s e l b u y b i d l ? l s t o c k ! s t o c k ! s t o c k ! s t o p !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
s t o c k ! t h e
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
b e l
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
Note: Articles and prepositions , which are also known as stop words are excluded.
b e i l l
null
h u l l
null
s e e
null
e y
null
a
r
null
d
null
a
r
null
l
l
null
o
c k
null
47, 58
36
0, 24
p
null
78
30
69
12
84
24
17, 40
APPLICATION OF TRIE
Tries and Search Engine The index of a search engine (collection of all searchable words) is stored into a compressed trie Each leaf of the trie is associated with a word and has a list of pages (URLs) containing that word, called occurrence list The trie is kept in internal memory The occurrence lists are kept in external memory and are ranked by relevance Boolean queries for sets of words (e.g., Java and coffee) correspond to set operations (e.g., intersection) on the occurrence lists Additional information retrieval techniques are used, such as stopword elimination (e.g., ignore the a is) stemming (e.g., identify add adding added) link analysis (recognize authoritative pages)
25
APPLICATION OF TRIE
Tries and Internet Routers Computers on the internet (hosts) are identified by a unique 32-bit IP (internet protocol) address, usually written in dotted-quad-decimal notation E.g., www.cs.brown.edu is 128.148.32.110 Use nslookup on Unix to find out IP addresses An organization uses a subset of IP addresses with the same prefix, e.g., Brown uses 128.148.*.*, Yale uses 130.132.*.* Data is sent to a host by fragmenting it into packets. Each packet carries the IP address of its destination. The internet whose nodes are routers, and whose edges are communication links. A router forwards packets to its neighbors using IP prefix matching rules. E.g., a packet with IP prefix 128.148. should be forwarded to the Brown gateway router. Routers use tries on the alphabet 0,1 to do prefix matching.
26