Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dr.Aruna Malapati
BITS Pilani Asst Professor
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus
Spelling Correction
Todays Learning objectives
Context-sensitive
E.g., OCR can confuse O and D more often than it would confuse O and I
(adjacent on the QWERTY keyboard, so more likely interchanged in typing).
We can either
Retrieve documents indexed by the correct spelling, OR
Whats closest?
Several alternatives
n-gram overlap
E(0,0) = 0
E(1,0) = 1
Base Cases
E(k,0) = k
E(0,k) = k
m(xm)
N E T W O R K S
0 1 2 3 4 5 6 7 8
C 1
A 2
T 3
S 4
BITS Pilani, Hyderabad Campus
Weighted edit distance
Alternatively,
We can look up all possible corrections in our inverted index and return all docs
slow
Alternative?
X Y / X Y
Equals 1 when X and Y have the same elements and
zero when they are disjoint
http://www.creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#Top
6. Pad the resulting string with trailing zeros and return the
first four positions, which will be of the form <uppercase
letter> <digit> <digit> <digit>.
http://norvig.com/spell-correct.html