Supposing that we have a string not in the classic form but in a form like that
A1B1C1D2E1G1F1
A1B1C1D2F1P1X1
where the numbers are the frequencies of chars.
How can i work the longest common subsequence? I have worked with arrays of struct but i find some difficulty. Can someone provide code for the LCS part working also with frequencies?
Assuming that you are using a classic implementation for LCS, (I’m using Wikipedia as a reference), this is what you could do:
Store the character and frequency as a key:value pair, say in a struct. Hence each string could be represented as a linked list of such structs.
Make the traditional 2D array used in LCS.
When there is a match at a particular cell, instead of appending only the character, append a struct that has the character and the minimum of the two frequencies. (Each cell could be a linked list as well).
Small example: Consider the two strings:
A1B2C1D2
A2B1C3E3
Legend (in terms of steps followed): Orange -> Blue -> Pink -> Red -> Black
Whenever there is no match, simply take the longer string from the top and left cells.
If there is a match, append the longest possible match to the top left cell value and store in current cell.
You can use the same DP as for LCS of characters (dp[i][j] is the LCS of the first i characters of the first string and the first j characters of the second one), but on arrays of different objects than characters - those “repeated character” objects in this case.
Where you had dp[i][j] =max(dp[i][j],dp[i-1][j-1]+1), you should replace that 1 by LCS(object i, object j), which is 0 if they have different repeated characters and minimum of their lengths otherwise.