STRSUB - Editorial

admin · March 16, 2015, 4:10pm

PROBLEM LINK:

Practice
Contest

Author: Lalit Kundu
Tester: Hiroto Sekido
Editorialist: Kevin Atienza

DIFFICULTY:

SIMPLE

PREREQUISITES:

Dynamic programming, preprocessing, binary search, cumulative sums

PROBLEM:

You are given a string S of length N consisting only of 0 s and 1 s. You are also given an integer K.

You have to answer Q queries. In the i th query, two integers L and R are given. Then you should print the number of substrings of S[L, R] which contain at most K 0 s and at most K 1 s where S[L, R] denotes the substring from L th to R th characters of the string S.

In other words, you have to count number of pairs (i, j) of integers such that L \le i \le j \le R such that no character in substring S[i, j] occurs more than K times.

QUICK EXPLANATION:

Let \text{far}[i] be the last index j such that S[i, j] has at most K 0 s and at most K 1 s. Then for a given query (L, R), the number of valid strings starting at index i is \min(R,\text{far}[i])-i+1 (for L \le i \le R). Therefore, the answer for the query (L, R) is the following:

\sum_{i=L}^R \left(\min\left(R,\text{far}[i]\right)-i+1\right)

Note that \text{far}[i] never decreases, so we can use binary search to find the maximum index k such that \text{far}[i] \le R. The sum then becomes:

\sum_{i=L}^k \left(\text{far}[i]-i+1\right) + \sum_{i=k+1}^R \left(R-i+1\right)

Each of these is solvable in closed form, except possibly for range sums of \text{far}[i]. But for that, we can simply compute cumulative sums of \text{far}[i].

The algorithm runs in O(N + Q \log N), but can be sped up to O(N + Q) by getting rid of binary search. We’ll describe this below.

EXPLANATION:

We will begin by describing a slow solution, and then work on improving it incrementally. We will call a string valid if there are at most K 0 s and at most K 1 s.

Slow solution

First, we can simply test each substring and count all those that are valid. This can be accomplished, for example, with the following:

def answer_query(L, R):
    answer = 0
    for i in L...R
        for j in i...R
            # try the substring S[i, j]
            count0 = count1 = 0
            for k in i...j
                if S[k] == '1'
                    count1 += 1
                else
                    count0 += 1

            if count0 <= K and count1 <= K
                answer += 1

    return answer

How fast does this go? This runs proportionally to the sum of the length of all substrings of S[L, R], and one can easily compute it to be approximately \frac{(R-L+1)^3}{6}. Since R-L+1 is at most N, in the worst case the running time is approximately \frac{N^3}{6} steps. This won’t pass any of the subtasks!

Breaking early

The above brute-force solution can be optimized to pass subtask 2, by the observation that valid strings are at most 2K in length. Therefore, we can simply check all substrings that are at most 2K in length, and there are O(NK) of them:

def answer_query(L, R):
    answer = 0
    for i in L...R
        # process only strings up to min(2K-1,R)
        for j in i...min(i+2*K-1, R)
            # try the substring S[i, j]
            count0 = count1 = 0
            for k in i...j
                if S[k] == '1'
                    count1 += 1
                else
                    count0 += 1

                if count0 > K or count1 > K:
                    break

            if count0 <= K and count1 <= K
                answer += 1

    return answer

Notice that aside from reducing the limit of j to \min(2K+1,R), we have also added a break statement, because when either count0 or count1 exceeds K, then there is no more point in proceeding to inspect the rest of the letters, since we already know that the string wouldn’t be valid.

Thus the algorithm now runs in O(NK\min(N,K)) time per query and O(QNK\min(N,K)) time overall, and passes subtask 2!

Dynamic programming

In fact, we can extend the argument we used about the break statement! Notice that the substring of a valid string is also valid, and a superstring of any invalid string is also invalid! Therefore, we can break out of the j loop once we encounter an invalid string! However, this alone won’t improve the code in the worst case, because it might be the case that we don’t encounter an invalid string, or we encounter such a string late enough.

However, we can exploit even more properties of the substrings we are inspecting! Notice that to check whether a substring S[i, j] is valid or not, we only need to count the 0 s and 1 s in it. However, this can be computed easily once we know these counts for the substring S[i, j-1]! Specifically, as we process the substrings S[i, j] for a fixed i and increasing j, we only need to increment count0 or count1 depending on S[i]:

def answer_query(L, R):
    answer = 0
    for i in L...R
        count0 = count1 = 0
        for j in i...min(i+2*K-1, R)
            # try the substring S[i, j]
            if S[j] == '1'
                count1 += 1
            else
                count0 += 1

            if count0 <= K and count1 <= K
                answer += 1
            else
                break

    return answer

Now, this runs in O(N\min(N,K)) time per query, and O(QN\min(N,K)) time overall!

However, using this might still not pass subtask 1, because there are up to 10^5 queries. Thankfully, in subtask 1, N is at most 100, so there are only at most N(N-1)/2 substrings S[L, R]. Thus, we can simply precompute the answer for all those substrings, and then answer the Q queries with a simple lookup. This approach should run in O(N^3\min(N,K)+Q) and should be able to pass subtask 1!

Walking algorithm

There is still a way to optimize the above! Remember what we said above, that the substring of a valid string is also valid, and a superstring of any invalid string is also invalid. Therefore, if S[i, j] is valid, then S[i+1, j] is also valid!. Therefore, when we process the next i, we don’t have to start j from i any more, because we already know many strings are valid from the previous i. Specifically, if j' is the last j such that S[i, j] is valid, then S[i+1, j'] is also valid, so we can simply start iterating j from j'+1 onwards. What’s more, we can compute count0 and count1 of S[i+1, j'+1] from S[i, j'+1] by simply decrementing one of them depending on S[i]! This is illustrated in the following code:

def answer_query(L, R):
    answer = 0
    j = L
    count0 = count1 = 0
    if S[L] == '1'
        count1 += 1
    else
        count0 += 1
    for i in L...R
        while j <= R and count0 <= K and count1 <= K:
            j += 1
            if j > R
                break
            if S[j] == '1'
                count1 += 1
            else
                count0 += 1

        # at this point, we know S[i, j-1] is valid but S[i, j] is invalid
        answer += j - i

        # decrement
        if S[i] == '1'
            count1 -= 1
        else
            count0 -= 1

    return answer

Now, how fast does this go? Notice that there are still nested loops. However, every time the inner loop iterates, j increases by 1. Therefore, the inner loop runs in at most R-L steps, or O(N). Therefore, the whole algorithm runs in O(N) time per query, and O(QN) time overall! This should be able to pass subtask 1, 2 and 3.

Preprocessing and binary search

The previous algorithm is too slow for subtask 4, because it still takes O(N) time per query. In fact, this is probably the fastest we can do without some sort of preprocessing, because at the very least we have to read the string S[L, R] to compute the answer, and this already takes O(N) time. Thus, we will try to speed up the algorithm by preprocessing.

First, note that the crucial part of the previous solution is finding, for each i, the first j such that S[i, j] is invalid, or j > R. However, by ignoring first the "or j > R" part, we see that the first such j for each i only depends on the string S! For a given i, let’s denote that j by \text{far}[i] (if you read the quick explanation above, note that this \text{far} is different from the \text{far} there. Specifically, this one is larger by exactly 1.). Thus, we can try to precompute \text{far} at the beginning:

far[1...N]

def precompute():
    count0 = count1 = 0
    j = 1
    if S[1] == '1'
        count1 += 1
    else
        count0 += 1
    for i in 1...N
        while j <= N and count0 <= K and count1 <= K:
            j += 1
            if j > N
                break
            if S[j] == '1'
                count1 += 1
            else
                count0 += 1

        # at this point, we know S[i, j-1] is valid but S[i, j] is invalid
        far[i] = j

        # decrement
        if S[i] == '1'
            count1 -= 1
        else
            count0 -= 1

and use that for our queries:

def answer_query(L, R):
    answer = 0
    for i in L...R
        j = min(far[i], R+1) # R+1 is the first j such that j > R
        answer += j - i
    return answer

The precomputation works similarly to the previous code and runs in O(N), but the queries still takes O(N) time each. But now, the answer_query code is much simpler, and in fact can be expressed by the following mathematical expression

\sum_{i=L}^R \left(\min(\text{far}[i], R+1) - i\right)

We can now compute such a sum using some simple manipulations:

\begin{aligned} &= \sum_{i=L}^R \min(\text{far}[i], R+1) - \sum_{i=L}^R i \\\ &= \sum_{i=L}^R \min(\text{far}[i], R+1) - \left(\frac{R(R+1)}{2} - \frac{L(L-1)}{2}\right) \end{aligned}

However, we still got this nasty \min term which we need to take care of. Thankfully, we can use the fact that \text{far}[i] never decreases, to know that \text{far}[i] will be \le R at the beginning, and then as i increases it will eventually exceed R, and once it does, it stays greater than R. Thus, it makes sense to find the last k such that \text{far}[k] \le R, so the above expression becomes

\begin{aligned} &= \sum_{i=L}^k \text{far}[i] + \sum_{i=k+1}^R (R+1) - \left(\frac{R(R+1)}{2} - \frac{L(L-1)}{2}\right) \\\ &= \sum_{i=L}^k \text{far}[i] + (R-k)(R+1) - \left(\frac{R(R+1)}{2} - \frac{L(L-1)}{2}\right) \end{aligned}

Now, we are almost down to O(1) computation, aside from two things: finding the k and a range sum for \text{far}[i]. But these are simple. First, k can be computed with binary search, as the index such that \text{far}[k] \le R < \text{far}[k+1], because \text{far}[i] is monotonically nondecreasing. Also, to compute range sums for \text{far}[i], one can simply use cumulative sums or prefix sums: Let \text{sumfar}[i] be the sum of the $\text{far}$s until the i th index. Then the sum \text{far}[i] + \text{far}[i+1] + \cdots + \text{far}[j] is simply \text{sumfar}[j] - \text{sumfar}[i-1]!

These are illustrated in the following code:

far[1...N]
sumfar[0...N]

def precompute():

    # precompute far
    count0 = count1 = 0
    j = 1
    if S[1] == '1'
        count1 += 1
    else
        count0 += 1
    for i in 1...N
        while j <= N and count0 <= K and count1 <= K:
            j += 1
            if j > N
                break
            if S[j] == '1'
                count1 += 1
            else
                count0 += 1

        far[i] = j

        # decrement
        if S[i] == '1'
            count1 -= 1
        else
            count0 -= 1

    # precompute sumfar
    sumfar[0] = 0
    for i in 1...N
        sumfar[i] = sumfar[i-1] + far[i]

def answer_query(L, R):

    # binary search to find k such that far[k] <= R < far[k+1]
    # we maintain the invariant far[k1] <= R < far[k2]
    k1 = L-1
    k2 = R+1
    while k2 - k1 > 1:
        km = (k1 + k2) / 2 # here, "/" floor division
        if far[km] <= R
            k1 = km
        else
            k2 = km

    k = k1 # k is now equal to k1 because k2 - k1 = 1 and far[k1] <= R < far[k2]

    answer = sumfar[k] - sumfar[L-1] + (R-k)*(R+1) - (R*(R+1)/2 - L*(L-1)/2)

    return answer

Using this, one can see that precomputation still runs in O(N) time, and queries now run in O(\log N) time each, due to the binary search. Thus, the overall algorithm runs in O(N + Q \log N) time, which comfortably passes all the subtasks!

Be careful with overflows! Use the right data type for this.

Bonus: \text{far}[i] and \text{raf}[i]

The above solution already works, but we will introduce a final optimization here. Specifically, we will try to improve our algorithm to compute the k in each query.

Note that k is the largest index such that \text{far}[k] \le R or k < L. The key idea is that, by ignoring first the "or k < L" part, we see that k is only dependent on R! Thus it would be nice if we are able to precompute the k s for all possible R s, and in fact it is easy to do so.

Let’s define a similar array \text{raf}, where \text{raf}[R] is the smallest index i such that \text{far}[i] > R. Using this array, one can compute k simply as \max(L,\text{raf}[R])-1 (we’ll leave this to the reader to see why). Now, how do we compute all the $\text{raf}s? The idea is that \text{raf} is essentially the *reverse* of \text{far}$, only that the direction is to the left rather than to the right. Thus, a similar O(N) time walking algorithm can be used to compute it. We will leave this as an exercise for the reader, however, because we will show here a different way to compute it in O(N) time, by exploiting the relationships between \text{far} and \text{raf}. See the following pseudocode for details:

far[1...N]
raf[1...N]
sumfar[0...N]

def precompute():

    # precompute far
    .....

    # precompute sumfar
    ....

    # precompute raf
    # initialize
    for i in 1...N
        raf[i] = -1

    # we know that far[i]-1 < far[i], so if j = far[i]-1,
    # then raf[j] must be <= i, (because raf[j] is the least such i)
    # we process each i in decreasing order to guarantee that we assign the least such i
    for i in N...1 by -1
        raf[far[i]-1] = i

    # for all the raf[i]'s that we didn't encounter, set its value to raf[i+1]
    # because raf[i] <= raf[i+1]
    for i in N...1 by -1
        if raf[i] == -1
            raf[i] = raf[i+1]

def answer_query(L, R):

    k = max(L, raf[R])-1
    answer = sumfar[k] - sumfar[L-1] + (R-k)*(R+1) - (R*(R+1)/2 - L*(L-1)/2)
    return answer

Finally, one can now see that it runs in O(1) time per query, and O(N + Q) time overall!

Time Complexity:

O(N + Q)

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter
tester

shivam_isc · March 16, 2015, 4:39pm

here is my solution which precomputes the number of invalid strings for a particular index. And then applies some basic operations ignoring the invalid strings(which are not included in the given range), through BINARY SEARCH. And then from total substrings possible , those invalid are subtracted .

http://www.codechef.com/viewsolution/6514748

mkkhedawat · March 16, 2015, 4:53pm

Simpler Say :

For every query (L,R) :

Answer will be

for ( i = L , i<= R ; i++ )
   ans += min ( V,L )-i+1

where V is max index till which sub-string will be valid if started from index i.

PreCompute Summation of V for every index. { O(n) }
PreCompute Max index for which ‘V’ will be used in query
For used L from min( V, L ), is just summation of n numbers (n*n+1/2)

sohamd95 · March 16, 2015, 5:02pm

Wow… Such a simple question and yet there’s so much that can be done for optimisation! Beautiful!

nitish712 · March 17, 2015, 1:11pm

Here is my approach using Segment Trees:

Let A[i] (0-based index) denote the number of valid strings in [0,i].

Let B[i] denote the number of valid strings that start at an index <= i and end at an index > i

Let C[i] denote the minimum index k(<=i), such that sub-string [k,i] is valid.

Then, the number of valid strings in [i,j] is

A[j]-A[i-1]-B[i-1]

Note that, A[i] and C[i] for every i can be computed in O(n).

On the other hand, B[i] can be computed as follows:

I will explain with an example. Let the input string be 010010101 and k=3. For each character at index i, lets count how much it contributes to B[j] for j < i and find a pattern.

Lets start with i=1, since first character has no effect.

Contribution : 1 
      String : 0 1 0 0 1 0 1 0 1
                 ^

The character at i=1 contributes 1 to B[0] and the corresponding string being 01.

Moving on further,

Contribution : 1 2 
      String : 0 1 0 0 1 0 1 0 1
                   ^

Valid strings contributed to B[1] are 010,10.

Valid strings contributed to B[0] are 010.

Going on,

Contribution : 1 2 3
      String : 0 1 0 0 1 0 1 0 1
                     ^

and on,

Contribution : 1 2 3 4
      String : 0 1 0 0 1 0 1 0 1
                       ^

Contribution :     1 2 3
      String : 0 1 0 0 1 0 1 0 1
                         ^

Contribution :     1 2 3 4
      String : 0 1 0 0 1 0 1 0 1
                           ^

Contribution :       1 2 3 4
      String : 0 1 0 0 1 0 1 0 1
                             ^

Contribution :       1 2 3 4 5
      String : 0 1 0 0 1 0 1 0 1
                               ^

What we observe?

For each i, we add the sequence 1,2,3.. to B for all the indexes in the range [C[i],i-1].

We can use segment-tree with lazy propagation for this purpose. The whole pre-computation takes O(NlogN) time. Each query can be answered in O(logN). Final time-complexity being O(NlogN + QlogN).

Also, note that for a query (L,R), if C[R] <= L, then answer is (R-L)*(R-L-1)/2

nitish712 · March 17, 2015, 1:17pm

Link to solution http://www.codechef.com/viewsolution/6443579

shubham54 · March 18, 2015, 1:03am

could any one plz tell me why my approach is given tle …it just failing 2 subtask of 60 points
http://www.codechef.com/viewsolution/6482691

shivam_isc · March 18, 2015, 1:17am

@shubham54 … the reason is that for every test case you are using multiple arrays and setting initially all of them to 0. This is what most probably is the cause of TLE in harder tasks of subtask-4 .
Have a look at these 2 solutions of mine.

http://www.codechef.com/viewsolution/6432132

the above solution gives TLE in one of the subtask.The reason was i used 2 arrays “x” and “u” and was repeatedlty setting all of them 0 initial value which takes extra O(N) time

now have a look at solution 2

http://www.codechef.com/viewsolution/6514748

I didn’t do much change in it except that i used just one array and computed values through it. As you can see it got AC. It took me over 30 submissions to realise this.

dhruvgaur · March 18, 2015, 8:13pm

Hi All,

Can anyone provide some tricky test cases that you must have tested on your own. With my approach, I am able to pass :
1 out of 2 for subtask 1
2 out of 2 for subtask 2
1 out of 2 for subtask 3
4 out of 9 for subtask 4
but getting WA otherwise.

I couldn’t seem to prove my algo incorrect. Any help is highly appreciated.

pranavmyname · April 14, 2015, 1:55am

Can someone tell me what is wrong with my algorithm? Exceeding time limit is fine but for some cases it is giving wrong answer. My algo involves adding answer to previous answer and sum of count of ‘0’ and count of ‘1’. It is similar to dynamic algo mentioned above, where in each iteration either count1 or count0 is incremented and answer is incremented, which means answer is simply sum of count0 and count1.
http://www.codechef.com/viewsolution/6774546

lebron · April 14, 2015, 1:58am

At first glance - you are using int, while answer can be quite large (assume that N=10^5 and K=N).

pranavmyname · April 14, 2015, 10:07am

But i have this python code also using same algo
http://www.codechef.com/viewsolution/6774494

glear14195 · June 18, 2015, 7:22pm

My solution keeps giving the SIGABRT error, the algorithm seems to work fine on other online compilers. Could anyone please take a look at it??

xlr_8 · September 8, 2017, 11:27pm

Can some one, please explain the algorithm, I understood the implementation!

sandy_123 · May 23, 2018, 12:28pm

One of the best editorials I have seen!

cs2252 · June 11, 2018, 6:07pm

Great explanation. Thank you sir.