CHEFKLCS - Editorial

pushkarmishra · December 18, 2015, 12:59pm

PROBLEM LINK:

Practice
Contest

Author: Misha Chorniy
Tester: Jingbo Shang
Editorialist: Pushkar Mishra

DIFFICULTY:

Medium

PREREQUISITES:

DP, Longest Common Subsequence Algorithm

PROBLEM:

Given two strings of equal length, we have to report the k^{th} lexicographical longest common subsequence. In other words, let the strings be S_1 and S_2. Let L be the length of the longest common subsequence. Let P be the set of all the subsequences which are common to S_1 and S_2 and are of the length L. We have to report the k^{th} element in this set P when it is sorted lexicographically.

EXPLANATION:

Note: For the sake of simplicity, throughtout the editorial, we assume that S[1] is the first character of the string, i.e., all strings are 1-indexed.
The Longest Common Subsequence (LCS) is a very common problem. The traditional approach uses DP. Let L[i][j] be the length of the LCS when considering the last i characters of S_1 and j characters of S_2. We can formulate the recurrence for L[i][j]:

If S_1[i] = S_2[j] then L[i][j] = L[i+1][j+1]+1.
If S_1[i] \neq S_2[j] then L[i][j] = max(L[i+1][j], L[i][j+1]).

Now, let us find a way to count distinct longest common subsequences. Let D[i][j] be the number of LCS of length L[i][j] when considering the last i characters of S_1 and j characters of S_2, i.e., it gives the number of distinct LCS for substrings S_1[i..n] and S_2[j..n]. Now, we formulate the recurrence for this:

If S_1[i] = S_2[j] then D[i][j] = D[i+1][j+1].
If S_1[i] \neq S_2[j] and L[i][j] = L[i+1][j], then D[i][j] = D[i+1][j].
If S_1[i] \neq S_2[j] and L[i][j] = L[i][j+1], then D[i][j] = D[i][j+1].
If S_1[i] \neq S_2[j] and L[i+1][j] = L[i][j+1], then D[i][j] = D[i][j+1]+D[i+1][j] - D[i+1][j+1].

Now, before we move on to formalizing the algorithm, we have another observation to make:
Let their be two strings S_1 = abcddpq and S_2 = axbedff. Consider the LCS in the two strings up till the index 2 (starting index = 1), i.e., ab. Now, let us consider how to extend the LCS when taking letters after index 2. We see that we can append a d to make the LCS abd. But, S_1 has 2 d. Which one should be used to extend? In this example, it doesn’t matter. But what if S_2 was axbedfd. We see that it is always favorable to match the earliest appearences of a character in the two strings whenever that is possible. This way, the following appearences of that character can be used to extend the LCS.

Now, we have all the essential elements for the formal algorithm. L[1][1] gives the length of the LCS for the input strings. Set L[0][0] = L[1][1]+1. D[1][1] gives the number of distinct LCS of length L[1][1]. Set D[0][0] = 0. Let us keep two more arrays, pointer_S1[n][26] and pointer_S2[n][26]. Both of them serve the same purpose, but one is for string S_1 and the other for string S_2. The purpose is that pointer_S1[i][j] stores the index p such that p > i and S_1[p] = j^{th} lower case letter. In other words, pointer_S1[i][0] stores the index p where p is the smallest index greater than i such that S_1[p] = ‘a’. If there is no occurrence of the particular letter after index i, then -1 is stored.

Let us see how we can get the required string from the above structures. If k > D[1][1], we can simply output -1 since there is no answer. Otherwise, we start at L[0][0]. We want to move to such an i, j such that the following conditions are met:

L[i][j] = L[0][0] - 1
S_1[i] = S_2[j]
The number of strings with letters smaller than S_1[i] in this position is less than the k
The number of strings with letters smaller than AND EQUAL TO S_1[i] in this position is \geq k.

If all these conditions are met, we will append the letter S_1[i] (= S_2[j]) to our answer.

With the help of the pointer_S1 and pointer_S2 arrays, we can iterate over all possible i, j to find the letter we require.

We repeat this till we reach an i, j such that L[i][j] = 1. A pseudocode of this procedure is provided below:

function get_string()
{
    String ans = "";
    i, j = 0;
    L[0][0] = L[1][1] + 1;

    while(L[i][j] > 1)
    {
        count, new_count = 0;
        for( character c = 'a' to 'z')
        {
            //Getting the next_i and next_j based on pointer arrays.
            next_i = pointer_S1[i][c];
            next_j = pointer_S2[j][c];

            if(next_i or next_j == -1)
                skip to next letter;

            if(L[next_i][next_j] != L[i][j] - 1)
                letter c doesn't lie on the desired LCS. Thus skip to next letter.

            //Producing new_count based on count and the number of distinct LCS between
            //substrings S1[next_i..n] and S2[next_j..n].
            new_count = count + D[next_i][next_j];

            //checking whether last condition is fulfilled or not
            if(count < k and k <= new_count)
            {
                //we have found the letter to extend the desired LCS.
                k = k-count;
                i = next_i;
                j = next_j;
                ans = ans + c;

                //breaking the loop since letter has been found
                break;
            }

            count = new_count;
        }
    }

    return ans;
}

Other implementation details can be seen from the editorialist’s solution.

COMPLEXITY:

\mathcal{O}(n^2)

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

patience · December 22, 2015, 8:52am

Here is my code…
https://www.codechef.com/viewsolution/8997375
I got Wa.
please help me where is my wrong.
give some critical test case.
thanks in advance

linonymous · December 22, 2015, 9:18am

None of the given sample solutions are visible. Every link shows “Access Denied”!

lohit_97 · December 22, 2015, 10:44am

Since, the author, tester and editorialist’s codes are not available, can someone give link to someone’s code, who has followed the approach stated above? Thanks in advance!