MOUSCH01-EDITORIAL

PROBLEM LINK

Practice http://www.codechef.com/INLO2015/problems/MOUSCH01

Contest http://www.codechef.com/INLO2015

Editorialist: Aviral Gupta http://www.codechef.com/users/aviral1310

DIFFICULTY:

Intermediate levels like SUMPLE-EASY, EASY-MEDIUM, MEDIUM-HARD are also possible.

PREREQUISITES:

Dynamic Programming.

PROBLEM

To find out the minimum number of moves required transform a given word into another

given word when the following operations are allowed:

? Insert a character at any position

? Remove an existing character

? Modify an existing character

? Swap an existing character

QUICK EXPLANATION

Presented here are two algorithms: the first, simpler one, computes what is known as the

optimal string alignment (sometimes called the restricted edit distance), while the second one

computes the Damerau–Levenshtein distance with adjacent transpositions. Adding

transpositions adds significant complexity. The difference between the two algorithms

consists in that the optimal string alignment algorithm computes the number of edit

operations needed to make the strings equal under the condition that no substring is edited

more than once, whereas the second one presents no such restriction.

Take for example the edit distance between CA and ABC. The Damerau–Levenshtein

distance LD(CA,ABC) = 2 because CA ? AC ? ABC, but the optimal string alignment

distance OSA(CA,ABC) = 3 because if the operation CA ? AC is used, it is not possible to

use AC ? ABC because that would require the substring to be edited more than once, which

is not allowed in OSA, and therefore the shortest sequence of operations is CA ? A ? AB

? ABC. Note that for the optimal string alignment distance, the triangle inequality does not

hold: OSA(CA,AC) + OSA(AC,ABC) < OSA(CA,ABC), and so it is not a true metric.

Optimal string alignment distance

Firstly, let us consider a direct extension of the formula used to calculate Levenshtein

distance. Below is pseudocode for a function OptimalStringAlignmentDistance that takes two

strings, str1 of length lenStr1, and str2 of length lenStr2, and computes the optimal string

alignment distance between them:

int OptimalStringAlignmentDistance(char str1[1…lenStr1], char

str2[1…lenStr2])

// d is a table with lenStr1+1 rows and lenStr2+1 columns

declare int d[0..lenStr1, 0..lenStr2]



// i and j are used to iterate over str1 and str2

declare int i, j, cost



// for loop is inclusive, need table 1 row/column larger than string 

length

for i from 0 to lenStr1

    d[i, 0] := i

for j from 1 to lenStr2

    d[0, j] := j



// pseudo-code assumes string indices start at 1, not 0

// if implemented, make sure to start comparing at 1st letter of 

strings

for i from 1 to lenStr1

    for j from 1 to lenStr2

        if str1[i] = str2[j] then cost := 0

                             else cost := 1

        d[i, j] := minimum(

                             d[i-1, j  ] + 1,     // deletion

                             d[i  , j-1] + 1,     // insertion

                             d[i-1, j-1] + cost   // substitution

                         )

        if(i > 1 and j > 1 and str1[i] = str2[j-1] and str1[i-1] = 

str2[j]) then

            d[i, j] := minimum(

                             d[i, j],

                             d[i-2, j-2] + cost   // transposition

                          )                        



return d[lenStr1, lenStr2]

Basically this is the algorithm to compute Levenshtein distance with one additional

recurrence:

        if(i > 1 and j > 1 and str1[i] = str2[j-1] and str1[i-1] = 

str2[j]) then

            d[i, j] := minimum(

                             d[i, j],

                             d[i-2, j-2] + cost   // transposition

                          )

Distance with adjacent transpositions

Here is the second algorithm that computes the true Damerau–Levenshtein distance with

adjacent transpositions (ActionScript 3.0); this function requires as an additional parameter

the size of the alphabet ©, so that all entries of the arrays are in 0…(C-1):

static public function damerauLevenshteinDistance(a:Array, b:Array,

C:uint):uint

{

// "infinite" distance is just the max possible distance

var INF:uint = a.length + b.length;



// make and initialize the character array indices            

var DA:Array = new Array(C);

for (var k:uint = 0; k < C; ++k) DA[k]=0;



// make the distance matrix H[-1..a.length][-1..b.length]

var H:matrix = new matrix(a.length+2,b.length+2);



// initialize the left and top edges of H

H[-1][-1] = INF;

for (var i:uint = 0; i <= a.length; ++i)

{

    H[i][-1] = INF;

    H[i][ 0] = i;

}

for (var j:uint = 0; j <= b.length; ++j)

{

    H[-1][j] = INF;

    H[ 0][j] = j;

}



// fill in the distance matrix H

// look at each character in a

for (var i:uint = 1; i <= a.length; ++i)

{

    var DB:uint = 0;

    // look at each character in b

    for (var j:uint = 1; j <= b.length; ++j)

    {

        var i1:uint = DA[b[j-1]];

        var j1:uint = DB;

        var cost:uint;

        if (a[i-1] == b[j-1])

           {

             cost = 0;

             DB   = j;

           }

        else

           cost = 1;

        H[i][j] = Math.min(    H[i-1 ][j-1 ] + cost,  // substitution

                               H[i   ][j-1 ] + 1,     // insertion

                               H[i-1 ][j   ] + 1,     // deletion

                               H[i1-1][j1-1] + (i-i1-1) + 1 + (j-j1-

1));

    }

    DA[a[i-1]] = i;

}

return H[a.length][b.length];

}