 # Longest Common subsequence

Supposing that we have a string not in the classic form but in a form like that
A1B1C1D2E1G1F1
A1B1C1D2F1P1X1

where the numbers are the frequencies of chars.

How can i work the longest common subsequence? I have worked with arrays of struct but i find some difficulty. Can someone provide code for the LCS part working also with frequencies?

1 Like

What’s the issue in converting this encoded string back to its original form and then applying LCS algo?

That would be really slower wouldn’t be?

3 Likes

Assuming that you are using a classic implementation for LCS, (I’m using Wikipedia as a reference), this is what you could do:

1. Store the character and frequency as a key:value pair, say in a struct. Hence each string could be represented as a linked list of such structs.
2. Make the traditional 2D array used in LCS.
3. When there is a match at a particular cell, instead of appending only the character, append a struct that has the character and the minimum of the two frequencies. (Each cell could be a linked list as well).

Small example: Consider the two strings:
A1B2C1D2
A2B1C3E3 Legend (in terms of steps followed): Orange -> Blue -> Pink -> Red -> Black

1. Whenever there is no match, simply take the longer string from the top and left cells.
2. If there is a match, append the longest possible match to the top left cell value and store in current cell.

You can use the same DP as for LCS of characters (dp[i][j] is the LCS of the first i characters of the first string and the first j characters of the second one), but on arrays of different objects than characters - those “repeated character” objects in this case.

Where you had dp[i][j] =max(dp[i][j],dp[i-1][j-1]+1), you should replace that 1 by LCS(object i, object j), which is 0 if they have different repeated characters and minimum of their lengths otherwise.

//