Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dr.Aruna Malapati
BITS Pilani Asst Professor
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus
Tolerant retrieval
Spelling correction
Soundex
...
An Array of
structures
Pros:
Lookup in a hash table is faster than lookup in a tree.
Cons:
No prefix search (all terms starting with automat) [tolerant retrieval]
Use B-trees
Solves the prefix problem.
Rebalancing is expensive
BITS Pilani, Hyderabad Campus
Binary Tree
M-N
A-K
H-J Ka-Ke
An-As
B-G
An-Ar Go-Gu
Assam B-C
* mon: find all docs containing any term ending with mon
Maintain an additional tree for terms backwards
How do we enumerate all the terms meeting the wild card query pro*cent?
Example: m*nchen
hello$,ello$h,llo$he,lo$hel,o$hell,$hello where$ is a
special symbol.
X lookup on X$ X* lookup on $ X*
$a,ap,pr,ri,il,l$,$i,is,s$,$t,th,he,e$,$c,cr,ru,
ue,el,le,es,st,t$, $m,mo,on,nt,h$
Query: mon*
$m
mace madden
Bigrams for query: $m, mo,on
mo
among amortize $m AND mo AND on
on
along among All terms staring with m and containing
Mo and containing on
2-gram index for the term moon
BITS Pilani, Hyderabad Campus
Processing wild card query
Search
Type your search terms, use * if you need to.
E.g., Alex* will match Alexander.