CBAL - Editorial



Author: Sunny Aggarwal
Tester: Kevin Atienza
Editorialist: Ajay K. Verma
Russian Translator: Sergey Kulik
Mandarian Translator: Gedi Zheng




sqrt decomposition


Given a string S, handle the queries of the following form: for given L, R and k, consider all substrings of S[L, R], which have each character occurring an even number of times in it, and return the sum of k-th power of the lengths of these substrings.


Create an array P, where P[i] corresponds to an encoding of the parity of frequency of characters in the prefix S[1, i]. This means that a substring S[L, R] is balanced iff P[R] = P[L - 1]. We can then use a sqrt decomposition to precompute the answer for O (N^{1.5}) subarrays such that queries can be answered in O (N^{0.5}) time.


We are given a string S of length N, and we want to determine whether its substring S[L, R] contains each character an even number of times. The easiest way is to count the number of occurrences of each character in the substring, and then check if all of them are even. This will take O (nk) time, where n is the length of the substring, and k is the size of the alphabet. If we have to answer several queries of this form, we can do some preprocessing to speedup the queries. For each character x, create an array P_x[1..N], where P_x[i] represents the number of occurrences of of x in the prefix S[1, i]. Using this array, we can compute the number of occurrences of x in S[L, R] in constant time. More specifically,

\text{count} (x, S[L, R]) = P_x[R] - P_x[L - 1]

Note that, we do not want to know the exact number of occurrences of x in S[L, R], but only whether it is even, i.e.,

P_x[R] - P_x[L - 1] = 0 \bmod 2
P_x[R] \bmod 2 = P_x[L - 1] \bmod 2

Hence, we do not need to store the exact number of number of occurrences of x in P_x, but only a single bit representing the parity of the count, and we only need to check whether P_x[R] = P_x[L - 1]. For each position i \in [1, N], we have k parity bits, one for each character. We call this k-bit vector the signature of that position. A substring S[L, R] will have the even number of occurrences of each character if and only the signature of position R and (L - 1) are the same. Since k is bounded by 26, we can actually convert the signature into a 26-bit integer, so that in order to compare the signature we only need to do an integer comparison, which can be done in constant time. From now onwards, we will deal with this signature array.

More formally, We are given an array P[1..N], and our task is handle the queries of the following form: Given L \leq R and k, consider all pair (x, y) such that L \leq x < y \leq R, and P[x] = P[y]. Compute the sum of (y - x)^k for all such pairs.

Note that the numbers in the array P can be as large as 2^{26}, however, since the size N of the array is bounded by 10^5, there can be at most 10^5 different numbers in the array. This means, we can normalize the array P such that each number of P is bounded by 10^5.

Normalize (int[] P) {
  // Stores the mapping from the original values of array P to the
  // normalized values.
  map mapping;
  int cnt = 0;

  for (int i = 1; i <= N; ++i) {
    // This is the first time we are seeing this number, let us create a mapping for it.
    if (mapping.find(P[i]) == mapping.end())
      mapping[P[i]] = cnt++;

    // Apply the mapping.
    P[i] = mapping[P[i]];

A Linear time solution:

Let us first find a slow approach, and we will later improve it, also let us consider the case when k = 0, i.e., we only want to know the number of such (x, y) pairs.

This can be done by making a traversal of the array from position L to R, while maintaining the frequency of each number encountered as shown below:

int CountPairs(int[] P, int L, int R) {
  // map which stores the frequency of the elements seen so far. 
  map freq;
  int count = 0;

  for (int i = L; i <= R; ++i) {
    // Each element with the value same as P[i] will make a pair
    // with the current element.
    count += freq[P[i]];
  return count;

The complexity of the above approach is O (n \log n), n being the length of the subarray. The \log n factor coming from the fact that we are using a map. In practice the use of a hash map will reduce the complexity to O (n). One could also achieve the worst case O (n) complexity by using an array for storing frequencies. Create an array Q[1..N], such that Q[x] stores the frequency of element x in the subarray. In the beginning of each query, we need to initialize the array, however, we do not need to initialize the frequency of all elements, but only those elements which we encounter in the subarray, this would take only O (n) time as shown below.

void init(int L, int R) {
  for (int i = L; i <= R; ++i) {
    Q[P[i]] = 0;

Now, let us consider the case when k = 1. Similar to the k = 0 case, we traverse the subarray. Suppose we are at element P[i], and we have already seen m elements with the same value as P[i] at indices i_1, i_2, \cdots, i_m, then we are adding m new pairs with total length:

(i - i_1 + 1) + (i - i_2 + 1) + \cdots + (i - i_m + 1)
= m (i + 1) - (i_1 + i_2 + \cdots + i_m)

We already know the value of m from the map “freq”. We also need a separate map, which stores the sum of indices of all elements with the value x.

int CountLen(int[] P, int L, int R) {
  // map which stores the frequency of the elements seen so far. 
  map  freq;
  // map which stores the sum of indices.
  map indexSum;

  int sum = 0;
  for (int i = L; i <= R; ++i) {
    // Each element with the value same as P[i] will make a pair
    // with the current element.
    sum += freq[P[i]] * (i + 1) - indexSum[P[i]];

    indexSum[P[i]] += i;
  return sum;

The case of k = 2 can be handled in a similar way, which case the addenda at encountering P[i] would be:

(i - i_1 + 1)^2 + (i - i_2 + 1)^2 + \cdots + (i - i_m + 1)^2
= m (i + 1)^2 - 2 (i + 1) (i_1 + i_2 + \cdots + i_m) + (i^2_1 + i^2_2 \cdots i^2_m)

Hence, we also need a third map, which stores the sum of square of indices.
Using this approach, we can answer a query in O (n) time, which is still slow, and will not work for the large subtasks. Next, we show how to pre-compute some values to speed up the query time.

Improve the Query Time Using Pre-Computation:

The main idea to improve the query time is to pre-compute the answer for some subarrays and reuse them for answering general queries.

Let us denote the answer for subarray P[L..R] by F_k (L, R). We pick a block size B, and then compute the answer for all subarrays [L, R], for which either L or R is divisible by B. This means we store the answer for O (N/B) subarrays. This can be done in O (N^2/B) time as shown below:

// F[i][j] contains the answer for subarray P[i * B..j]
int F0[N/B][N];
int F1[N/B][N];
int F2[N/B][N];

void PreCompute(int[] P) {
  for (int idx = 1; idx <= N/B; ++idx) {
    map freq;
    map indexSum;
    map indexSqSum;

    int powerZeroSum = 0;
    int powerOneSum = 0;
    int powerTwoSum = 0;

    for (int i = idx * B; i <= N; ++i) {
      // Update sum values.
      powerZeroSum += freq[P[i]];
      powerOneSum += freq[P[i]] * (i + 1) - indexSum[P[i]];
      powerTwoSum += freq[P[i]] * (i + 1) * (i + 1) -
`                  ` 2 * (i + 1)  * indexSum[P[i]] +
`                    indexSqSum[P[i]];

      F0[idx][i] = powerZeroSum;
      F1[idx][i] = powerOneSum;
      F2[idx][i] = powerTwoSum;

      // Update maps.
       indexSum[P[i]] += i;
       indexSqSum[P[i]] += i * i;

The above snippet computes the answer for all subarrays whose left endpoint is a multiple of B. In a similar way we can compute the answer for for subarrays whose right endpoint is a multiple of B.

Now, if we want to compute the answer for subarray P[L, R]. It is possible that both L and R lie in the same block, in this case, we use the method described in previous section, which would compute the answer in O (B) time. Now assume that L and R lie in different blocks. Let us say L lie in block i and R lie in block j.

F(L, R) = F(L, j B) + F((i + 1) B, R) - F((i + 1) B, j B) + G(L, (i + 1) B - 1, j B, R)

The first term in the RHS considers the pairs in the subarray P[L, jB], and the second terms considers all pairs in the subarray P[(i + 1) B, R]. Since both terms contain the pairs in the subarray P[(i + 1) B, j B], which are now counted twice, we subtract them in the third term. Note that these terms have already been pre-computed, as they have one endpoint divisible by B. We still need to count the pairs whose left endpoint lies in [L, (i + 1) B], and right endpoint lies in [j B, R]. These pairs are represented by the fourth term in the RHS. Since both intervals [L, (i + 1) B] and [j B, R] have length smaller than B, these pairs can be computed using a similar way in the previous section in O (B) time.

Hence, we have provided an approach which has pre-computation complexity O (N^2/B), and query time O (B). Hence, the total complexity will be O (N^2/B + QB). The optimal value of B is N^{0.5}, which gives us the complexity O ((N + Q) N^{0.5}).

Time Complexity:

O ((N + Q) * N^{0.5})


Author’s solution
Tester’s solution


Map the letters to its value like following:
a -> 2^0
b -> 2^1
c -> 2^2
…and so on
Then a string is balanced if and only if the xor sum is zero.


Congratulations, you described the first, trivial part of the solution.

Handling queries and (in my case) cleverly computing everything in long formulas for type=2 is considerably harder, algorithmically.


Not that hard IMO once you noticed there’s a sqrt approach.

Btw there’s no hard problem in MAY15, is there? :smiley:

1 Like

I started solving this problem by converting it into a simpler one by assigning
a bitmask, B[i], or simply put a 32-bit integer, to each index i=0..n, where n
is the length of the string S. Note that I have also introduced a dummy index,
0, and if I am referring to some character in the string, S, S[1] would be the
first letter. Bear with me, but I will also use the numerical value of the
characters (‘a’ -> 0, ‘b’ -> 1, …), as the ASCII values are somehow clumsy to
process. :slight_smile:

The bitmask, B[i], basically stores the parity of each character in the prefix
of the string, i.e., S[0..i]. For instance, if the first bit of B[3] is one,
then we know that there is an odd number of 'a’s in the first three characters
of the string. Basically, B[i] can be calculated with the following formula
B[i] := B[i - 1] \text{xor} 2^{S[i]}. We are basically copying the idea of
prefix sums to quickly retrieve the parity of each character in a substring,
S[L..R], by xoring B[L - 1] and B[R] (in integer prefix sums we are
subtracting \sigma_{L-1} from \sigma_R).

The definition of a balanced string is fairly obvious, but I will state it once
again. If every character is contained an even number of times, then we know for
certain, that this string is to be considered “balanced”.

Try it out on this example!

String   a b c b c b a c b
Index: 0 1 2 3 4 5 6 7 8 9
B:     0 1 3 7 5 1 3 2 6 7 

This problem can then be redefined as follows: for each query (L, R, k) we have
to calculate the following sum \sum (i-j)^k, where L - 1 \leq i < j \leq R
and B[i] = B[j]. From now on, we will always do a L := L - 1 before starting
the query because it kills me to write -1 all the time!

I do not know how others have dealt with this problem from now on, but I reckon
they started similarly.

Basically, we will use one of our old dirty tricks: we will use the answer of an
already calculated query (L, R, k) to process another query (L', R', k) of
the same k by adding/removing elements that did or did not occur in the old
answer. So if we imagine L and R as two pointers, and then we move them
correspondingly to L' and R', then we have to add/remove elements from our old

Consider the following example:

String   a b c b c b a c b
Index: 0 1 2 3 4 5 6 7 8 9
B:     0 1 3 7 5 1 3 2 6 7 

L, R       ^       ^
L', R'         ^     ^

Imagine we knew the answer of L, R, then we would need to remove the elements
with index 2, 3 and add the element with index 7 from the old answer to retrieve
the corresponding answer of L', R'. If adding and removing take O(1) to
process, then the complexity is O(|R'-R| + |L'-L|).

I am going to describe how we must add/remove from our old answer in order to
calculate the new answer. I will only describe this on the case k = 0, as the
other cases are trivial and should be considered as an exercise for the reader.
(no, seriously, ask me later, but I have got a final exam tomorrow :smiley: ):

Imagine we know how often a particular integer, u, is contained in the
subsequence [L, R]. Denote this number as C[u]. Without loss of generality
we assume that we are going to move our pointer, L, to the right, thus
removing the element B[L], then our new answer is going to be
new = old - C[B[L]]. On the other hand, if we move the pointer, L, to the
left, thus adding the element B[L - 1], we have to calculate our new answer as
new = old + C[B[L - 1]]. After moving the pointer we should increment or
decrement the corresponding count value, i.e., either C[B[L]] or C[B[L - 1]].

There are many offline problems that allow you to sort the queries beforehand
so that we can use O(\sqrt{N}) amount of time to calculate another query.
Unfortunately, we are dealing here with an encoded problem aka an online
problem, however, I got a nifty workaround, which also uses the idea of sqrt
decomposition, and I will describe in the next paragraph.

First we split the array into \sqrt{N} sized blocks. Then we precalculate
ans[i][j][k], where ans[i][j][k] is the answer of type k for the
subsequence B[i*\sqrt{N}...j*\sqrt{N}-1]. This has O(\sqrt{N} * N) time
complexity, but only O(\sqrt{N} * \sqrt{N}) = O(N) memory complexity. This is
fairly simple to calculate, as we if we know the answer for ans[i][j], we only
have to move the right pointer \sqrt{N} times to the right in order to
calculate ans[i][j + 1]!

If we receive a query we must figure out the best i, j that will allow us to
move our pointers at most O(\sqrt{N}) times in order to calculate the new
answer based on ans[i][j] with the aforementioned approach. But there is only
one thing that is left to be discussed: how do we get the aforementioned C[u]
value in case of k = 0?

For each u we are only going to store the prefix count C[u][l], which
basically tells us how many times u appears in the prefix sequence
[0..l*\sqrt{N} - 1]. Thus, if we find the right i, j, then, initially, in
the block [i*\sqrt{N}...j*\sqrt{N}-1], we have C[u][j] - C[u][i]
elements of u. However, if we are moving our pointers, this value will
certainly not be valid anymore, thus we have to store how many times the element
u got added/removed when moving the pointers. Let this number be T[u], then we
need to add T[u] to our initial answer in order to keep up to date with the
new pointers.

In total we have a time complexity and memory complexity of O(N \sqrt{N}). I
really look forward to other approaches! :slight_smile:


“Harder” is a relative measure, not an absolute one. I’m not saying it’s considerably hard.

There’s GRAPHCNT, it has sufficiently few solutions to be considered hard.


Thanks for the editorial! :slight_smile:

@gdisastery1 Can you please explain the case in your solution [ ] where the number of unique elements in the bitmask array is more than root(n). How do you store/calculate the count array(Z[][] I suppose) in that case?

sorry for the delay in editorial.

SEZ was supposed to be a hard problem, as we were thinking of the solution based on linear programming. However, it has an easy matching based solution, so I don’t think we can put that in hard category anymore.

Hi, these were just random thoughts that I’ve written down which I unfortunately had not removed before I submitted this AC code. Here’s a cleaner version: