Help with problem SAMER08D - DNA Sequences

hulk_baba · January 8, 2017, 9:41am

I am trying to solve SAMER08D - DNA Sequences
. I know how to solve LCS but can’t figure out how to think of states in this problem. I tried reading approaches for the solution of this problem on google but I only got codes. However, that didn’t help as I did not get a feel what code was trying to do. Can you please help me with this ?

hulk_baba · January 8, 2017, 8:28pm

Still waiting…

meooow · January 9, 2017, 12:38am

The standard LCS problem is solved with the help of this function definition and the application of dynamic programming. You can find the code in this geeksforgeeks article, I think the explanation is quite clear.

If you understand how this works we can now attempt to solve the SAMER08D problem. The only difference between the standard problem and the modified version is that the number of common characters we take at a time has a lower bound, which is the value k. We must always take k or more common characters to add the subsequence we will be building via dp. So the statement dp[i][j] = 1 + dp[i-1][j-1] if x[i]==y[j] is now invalid because this takes just 1 character into the subsequence. In place of this, we ought to iterate backwards as long as we get common characters, and start taking dp values into consideration only when we have taken k or more common characters. Also, now we cannot guarantee that taking a substring into our subsequence like this will be better than dp[i-1][j] and dp[i][j-1]. So we must always consider these two choices also. After making the changes, the code will look like

for(i=0; i<=xlen; i++){
    for(j=0; j<=ylen; j++){
        dp[i][j] = 0;
        if(i==0 || j==0)
            continue;
        c = 1;
        while(i-c>=0 && j-c>=0 && x[i-c]==y[j-c]){
            if(c>=k)
                dp[i][j] = MAX(dp[i][j], c+dp[i-c][j-c]);
            c++;
        }
        dp[i][j] = MAX(dp[i][j], dp[i-1][j]);
        dp[i][j] = MAX(dp[i][j], dp[i][j-1]);
    }
}

Complete solution here.
Note that the complexity is \mathcal{O}(l^3). I did get AC with this, but someone in the comments section mentioned an \mathcal{O}(l^2k) approach. If I find such an optimization I will update the answer