Sei sulla pagina 1di 14

14/5/2014

LEVENSHTEIN
DISTANCE
[Minimum Edit Distance between
two strings]
PROJECT BY

DINESH KUMAR R K K RAM KUMAAR
(106112026) (106112045)
1

TABLE OF CONTENTS

OVERVIEW
ALGORITHM
CODE
OUTPUT AND SCREENSHOTS








2

OVERVIEW:

Levenshtein distance (LD) is a measure of the
similarity between two strings, which we will refer to as
the source string (s) and the target string (t). The distance
is the number of deletions, insertions, or substitutions
required to transform s into t.
For example,
If s is "test" and t is "test", then LD(s,t) = 0, because
no transformations are needed. The strings are
already identical.

If s is "test" and t is "tent", then LD(s,t) = 1, because
one substitution (change "s" to "n") is sufficient to
transform s into t.

The greater the Levenshtein distance, the more
different the strings are.


3

Levenshtein distance is named after the
Russian Vladimir Levenshtein, who considered this
distance in 1965 may also be referred to as edit
distance, although that may also denote a
larger family of distance metrics. It is closely related
to pairwise string alignments.
The Levenshtein distance has several simple
upper and lower bounds. These include:
It is always at least the difference of the sizes of the
two strings.
It is at most the length of the longer string.
It is zero if and only if the strings are equal.
If the strings are the same size, the Hamming
distance is an upper bound on the Levenshtein
distance.
The Levenshtein distance between two strings is no
greater than the sum of their Levenshtein distances
from a third string (triangle inequality).
The Levenshtein distance can also be computed
between two longer strings, but the cost to compute it,
which is roughly proportional to the product of the two
string lengths, makes this impractical.

4

APPLICATIONS OF LEVEINSHTEIN DISTANCE
INCLUDE:
Spell checking
Speech recognition
DNA analysis
Plagiarism detection
Software to assist natural language translation based
on translation memory.
Correction systems for Optical character recognition.

In approximate string matching, the objective is
to find matches for short strings in many longer
texts, in situations where a small number of
differences is to be expected. The short strings could
come from a dictionary, for instance. Here, one of
the strings is typically short, while the other is
arbitrarily long.

The Dynamic Implementation of the
algorithm works in the order of O(mn), where m and n
are the lengths of string 1 and string 2 respectively.

5


ALGORITHM :

STEP 1 : Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix of 0..m rows and 0..n columns.
STEP 2 : Initialize the first row to 0..n.
Initialize the first column to 0..m.
STEP 3 : Examine each character of s (i from 1 to n).
STEP 4 : Examine each character of t (j from 1 to m).
STEP 5 : If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
STEP 6 : Set cell d[i,j] of the matrix equal to the minimum
of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost:
d[i-1,j-1] + cost.

6


Code:

#include<iostream>
#include<cstring>
using namespace std;

#define C (1)
int Minimum(int a, int b, int c)
{
int min=a;
if(b<min)
min=b;
if(c<min)
min=c;
return min;
}
7


int EditDistanceDP(char X[], char Y[])
{
int left, top, diagtopleft;
const int m = strlen(X)+1;
const int n = strlen(Y)+1;
int T[m][n];
for(int i = 0; i < m; i++)
for(int j = 0; j < n; j++)
T[i][j] = -1;

for(int i = 0; i < m; i++) // base case : 0'th column
T[i][0] = i;

for(int j = 0; j < n; j++) // base case : o'th row
T[0][j] = j;

8

for(int i = 1; i < m; i++)
{
for(int j = 1; j < n; j++)
{
left = T[i][j-1]; //case1 : deletion
left += C;

top = T[i-1][j]; //case2 : insertion
top += C;

diagtopleft = T[i-1][j-1]; //case3 : Replace
diagtopleft += (X[i-1] != Y[j-1]);
T[i][j] = Minimum(left, top, diagtopleft);
}
}
return T[m-1][n-1];
}
9

int EditDistanceRecursion( char *X, char *Y, int m, int n )
{
if( m == 0 && n == 0 )
return 0;

if( m == 0 )
return n;

if( n == 0 )
return m;

int left = EditDistanceRecursion(X, Y, m-1, n) + 1;
int right = EditDistanceRecursion(X, Y, m, n-1) + 1;
int corner = EditDistanceRecursion(X, Y, m-1, n-1) +
(X[m-1] != Y[n-1]);
return Minimum(left, right, corner);
}
10

int main()
{
char a[15],b[15];
cout<<"Enter string A : ";
cin>>a;
cout<<"Enter string B : ";
cin>>b;
cout<<"\nDP:\nMinimum edits required to convert
"<<a<<" into "<<b<<" is "<<EditDistanceDP(a, b)<<"\n";
cout<<"\nRecursion:\nMinimum edits required to
convert "<<a<<" into "<<b<<" is "
<<EditDistanceRecursion(a, b,strlen(a),strlen(b))<<"\n";
return 0;
}




11

Sample Inputs and Outputs:
Output 1:
Source string : Levinshtein
Target string : Meilenstein




12

Output 2:

Source String : September
Target String : October





13

Output 3 :

Source string: Algorithms
Target String: datastructures

Potrebbero piacerti anche