Please share your approach for Compression Algorithm.

Its been two days and all I have is K^N in denominator and complex summation series in numerator. I am stuck here.

I assume you saw the binomial expression to which it converges to? Or is the final expression also not clear to you?

In my derivation (possibly faulty)There is 2 * K * x * N-1Cx-1 * (K-1)^(x-1) in summation series . I can take 2 and K outside . But how to coverge the rest into binomial wxpression. [x is size of segment which can be from one to N if K>1]

We solved the problem using Linearity of expectation. It becomes very simple if we use the linearity of expectation.

l = 2*(1 + prob(s[0]!=s[1]) + prob(s[1]!=s[2]) + .... + prob(s[n-2]!=s[n-1]) )

l = 2*(1 + (n-1)*prob(s[0]!=s[1]) )

l = 2*(1 + (n-1)*(1 - prob(s[0]==s[1]) )

l = 2*(1 + (n-1)*(1 - c/c^2) )

l = 2*(1 + (n-1)*(1 - 1/c) )

So that a most simple way to get the solution. We also wasted half an hour during the contest deriving the equation through other methods, but then it suddenly strike me this approach to derive the final formula.


Try to count the strings based on their compressed length, if compressed length is equal to 2*i, then that means there are ‘i’ adjacent substrings of same characters. The lengths of all those parts must sum to ‘n’ and each must have length > 0. You can see how to count that here :

And, if the compressed length is ‘i’, you can assign characters in k*(k-1)^(i-1) ways because the first character can be anything and the rest of the characters must not be the same as the previous one, so, they will have k-1 choices. If you write the summation of this, you will find that it will be of the form:


Here, you need a bit of manipulation. Take 2*k out of the summation and write ‘i’ as (i-1)+1 and split it to two summations, you can use the fact that C(n,r) = n/r * C(n-1,r-1) and take out N-1 from the first summation, you will get two summations, which will be in the same form as the binomial expansion for (1 + (k-1) ) ^ (n-1) and (1 + (k-1) ) ^ (n-2).

It was a nice problem. Didn’t expect so many submissions though.


nCr = n/r * n-1Cr-1

1 Like

Thanks. I had arrived upto (2iC(N-1,i-1)k(k-1)^(i-1)) and could derive the final equation by following the manipulations you said.


I can see what you have done. But due to the lack of clear understanding of Linearity of Expectation its difficult for me to see why what have you done is right? Thanks, I will try this approach once I get the Linearity of Expectation concept clear!

Let us assume that two consecutive positions in the string don’t have the same letter. In such a case, the length of the compressed string will be 2N (this is the maximum length possible). Now let us assume that we are changing the values of the last N-1 letters randomly. Since there are K possibilities while assigning a letter to a position, the probability that the letter chosen would be the same as the letter at the previous position is 1/K. If the letter chosen is indeed the same as the previous letter, then the length of the compressed string decreases by 2. Therefore the expected decrease in the length is 2/K, and the total expected decrease is 2/K * (N-1). So the expected length is 2N - 2*(N-1)/K.


that was really a smart trick ! thanks a ton for this :slight_smile:

Well , I proceeded the same way… it was using pascal’s identity and then differentiating… I dont remember the exact summation, but if u can tell me that, then i can share my steps…

First you should frame the recursive equation and then reduce it to a formula , let me tell you my approach for this problem :

E[n,k]= 1/k * (E[n-1,k]) + (k-1)/k * (E[n-1,k] + 2) .

where E[n,k]= expected value of the string of length n which can use at most k different characters.

This is because there are 1/k chances of placing the same letter as in the string of length (n-1) so the answer would remain same in this case but (k-1)/k chances that nth letter could be different and so accordingly the answer would increment by 2. I hope this equation is clear to you! Now just solving it a bit more, will give you the expression :

E[n,k]= E[n-1,k] + 2(k-1)/k* . Now just applying the rules of induction and applying the base case E[1,k]=2 because for string of length “1” expected length would be just 2(two). We can reduce this equation down to a straight formula :

E[n,k]= 2(k-1)/k + …upto (n-1) times + E[1,k]* (which is equal to 2 ).

E[n,k]= 2(n-1)*(k-1)/k + 2* (this is the final answer in terms of n and k where n=length of string and k different letters )

Hope this helps !!
@pankajkhan ask if you have any queries still


You can write 1 as c/c.