i am getting a trouble finding an approach to solve this problem…
input-output sequences are as follows
input1 : aaagctgctagag
output1 : a3gct2ag2
input2 : aaaaaaagctaagctaag
output2 : a6agcta2ag
input nsequence can be of 10^6 characters and largest continuous patterns will be considered. For example in input2 “agctaagcta” it will not be agcta2gcta but it will be “agcta2”.
any help appreciated.
Extra Examples :
-
input aabbaabb
output is aabb2 not a2b2a2b2 -
input aaaaaaaaabbbbbbbbbaaaaaaaaabbbbbbbbb
output is a9b9a9b9 not aaaaaaaaabbbbbbbbb2
It shows that smaller the encode it is most likely to be an answer.