MGCHGYM - Editorial

Problem Link






Treap, bitset, DP


Maintain two types of modifications on the given array and solve the subset-sum problem on its’ subsegments.

How to get 25 points

For the beginning, let’s focus on the solution to the first subtasks. The constraints are rather small, so all the modifications can be done naively, in the straightformard way. Namely, if you need to change the value of an element, you can do this in O(1) time, and when you need to reverse the subarray, straightforward O(N)-per-reverse approch will do.

Let’s deal with the queries of the third kind.

The following simple DP technique will help us to determine - whether it is possible to get W as the sum of some of these numbers or not. First of all, pseudocode:

fill dp[] with zeroes
dp[0] = 1 
for i = L; i <= R; ++i
   for j = W; j >= a[i]; --j
      dp[j] |= dp[j - a[i]]

Now, let’s have a closer look to the provided code. Suppose that we operate on the array a[]. The value of dp[k] will be equal to one if it is possible to collect the sum of k from the given integers and will be equal to zero otherwise. Initially, only dp[0] is non-zero.

Then, step by step we recalculate the values of dp[] by adding the integers from the given set one-by-one. The value of dp[j] will become equal to one during the adding of the ith element (say a[i]) if it was possible to collect a sum of (j - a[i]) before. Of course, we consider that j ≥ a[i].

The whole complexity of the described solution is O(NQ+PNW) or simply O(N(Q+PW)).

This solution will be sufficient to solve the first and the second subtasks.

How to get 50 points

At this point, we won’t speed the modifications up, but will improve the execution time of the subset-sum part.

For doing this, we will have two crucial optimizations.

The first optimization

Now it’s time to remember that there are only K ≥ 10 types of the element of the array a[] (aka weight-plates). During each of the subset-sum queries we process R-L+1 elements of a[] which can be optimized to O(K log (N / K)) = O(K log N) - considering the small value of K it will be good.

Let’s calculate the number of weight plates of each kind. Now, consider a single kind of weight plates and suppose that there are X plates of this kind. Let’s create the new summands for our subset-sum problem by grouping the weight plates:

  • The first summand should be equal to the weight of one weight plate;
  • The second summand should be equal to the weight of two weight plates;
  • The third summand should be equal to the weight of four weight plates;
  • And so on: the ith summand should be equal to the weight of 2(i-1) weight plates;

At some moment, you won’t be able to create one more summand. This moment is when the amount of the weight plates you need for the new group is greater than the amount of the weight plates that are left. In this case, the last group should contain all the remaining weight plates.

Now, let’s consider some examples of making the groups for the different amount of weight plates:

  • If you have 1 weight plate, there will be a single group containing this weight plate;
  • If you have 2 weight plates, there will be two groups, each containing a single weight plate;
  • If you have 3 weight plates, there will be two groups - the first containing a single weight plate and the second containing two weight plates;
  • If you have 4 weight plates, there will be three groups - the first containing a single weight plate, the second containing two weight plates and the third, again, containing a single weight plate.
  • Finally, if you have 100 weight plates, there will be 7 groups, containing 1, 2, 4, 8, 16, 32 and 37 weight plates respectively.

This way, for each kind of the weight plates, you will have no more than O(log N) summands that results in O(NQ + PWK log N) complexity.

The second optimization

Consider the version of the subset-sum DP that was shown in “How to get 25 points” part. You can note that all the elements of dp[] are either ones or zeroes and the operations on them are binary - to be more precise, only binary OR.

This enables us to use bitset, so we can speed the code up in 32 times.

Now we’ve got O(NQ + PWK log N / 32) solution that can additionally solve the third subtask. Such solution gets 50 points.

Getting full points

We’ve already achieved O(NQ + PWK log N / 32) complexity. Let’s note that for the constraints of the problem the O(PWK log N / 32) part is already small enough. That means that now we process the queries of the third kind fast enough. Now we need to improve the processing of the queries of the first and the second kind.

Both these queries are actually correspond to a very standard problem - maintain the array under the following queries:

  • Reverse a subarray of the array in O(log N) time;
  • Change the value of a single array’s element in O(log N) time;

This problem is easily solvable with a data structure called treap.

In our problem, there is one more query that should be maintained:

  • Find the amount of the weight plates of each kind on a given segment

For doing this, we can simply store an array of K elements in each node of the treap, where the ith element of this array will denote the number of the weight plated of the ith kind on the corresponding segments. That gives us O(K log N)-per-modification time for the modification query of the both - the first and the second kinds.

Eventually, we get the complexity of O(QK log N + PWK log N / 32). This solution passes all the subtasks and gets 100 points.

Setter’s Solution

Can be found here

Tester’s Solution

Can be found here


I think the test cases were a bit weak, because my extreme brute force solution (with 1D DP for subset sum and fast I/O ) got 50 points without any optimization as such.


Well here’s what I tried during the contest.

My complexity for query 3 is O(K*W) and for query 1 and query 2 its O(Log N) by using Treap.

I tried to optimise as much as I could but the code was giving TLE on one test case.

Can you explain why? :confused:

I personally feel time constraints were a bit too strict.

My solution

1 Like

What is the purpose of all numbers being squarefree?

1 Like

Probably nothing.

1 Like

Ha! I have complexity ~ O(Q sqrt N + QK + PK2^K + WK)! Okay, never mind, it’s something like O(QN). Luckily, it’s very unlikely to reach the worst case and my code was fast enough.

1 Like

My solution is a bit different than the one described in editorial and has a time complexity
O ((W + P) 2^k) for the third kind of queries.

In order to check whether a sum S is possible, we compute the number of ways T(S) of obtaining that sum. The sum S is possible iff T(S) \ne 0.

Let us say we have coins of denomination x1, x2, \ldots, xk.
and queries are of the form (S, r1, r2, \ldots, rk), i.e., is it possible to make a sum S, using at most r1 coins of the first type, r2 coins of the second type, and so on (S \le W).

For this set of denomination, we pre-compute an array P[1 \ldots W], such that P[x] represents the number of ways of obtaining the sum x, with no restriction on the number of coins. For the time being let us assume that we already have this array, we will explain later how to compute this array.

Now the number of ways of obtaining the sum S with the restrictions on the number of coins can be computed using inclusion-exclusion principle.

T(S, r1, r2, \ldots, rk) = P[S] - Q(S, r1, r2, \ldots, rk),

where Q(S, r1, r2, \ldots, rk) is the number of ways in which at least one of coins is used more than specified bound.

Q(S, r1, r2, …, rk)

= \sum (-1)^{u + 1} \text{number of ways of obtaining } S \text{ such that coins } i1, i2, \ldots, iu \text{ violate the bound.}

= \sum (-1)^{u + 1} P[S - (r_{i1} + 1) x_{i1} - (r_{i2} + 1) x_{i2} - \ldots - (r_{iu} + 1) x_{iu}]

Hence, T(S, r1, r2, \ldots, rk) can be computed in O(2^k) time using the P array.

Now let us see, how to compute the array P.
If there were a single coin denomination x, then we have computed this array quite easily.

P[i] = 1, if i is divisible by x, and 0 otherwise,

Suppose we already have such an array Q for the coins \{x2, x3, \ldots, xk\}, and we want to use it to compute array P for the coin set \{x1, x2, \ldots, xk\}

P[i] = Q[i] + Q[i - x1] + Q[i - 2x1] + \ldots

Note that,
P[i] = Q[i] + P[i - x1]

Hence, array P can be computed in O (W) time using the array Q.

We have k denominations, and we would want to compute the pre-computation array for each subset of denominations, hence the pre-computation can be done in O (W2^k) time.

Alternatively, we can always use the denomination set which consists of all k types, and add the bounds (r_i = 0), for denominations which are missing in the set. This means we only need to pre-compute the array for the coin set consisting of all denominations, and can be done in O (Wk)

Also note that, we need to do all the computation modulo some prime (I chose 10^4 + 7, mainly to reduce the number of modulo operations)

Of course, this means that the constraint P \le 1000, can be relaxed, and we can probably allow up to 10^5 queries. On the other hand, this approach has exponential complexity in terms of k, so even if we increase the value of k slightly (say to 15), this approach will fail.

8 Likes This solutions runs fine for most test cases but failed one can somebody Explain!

Test cases are definitely weak. A lot of people had similar O(n*W) solutions for query 3 and got 50 points. I think Codechef should rejudge the solutions after revising test cases.

In the first optimization, k<=10.

We can actually do that knapsack in O(PWK) time instead of O(PWKlogN/32). Although it does not work faster -at least not my implementation- it’s better asymptotically.

Suppose we have K different values : \{(x_1, c_1),(x_2,c_2)...,(x_K,c_K)\}.

x_k denotes the value and c_k denotes the quantity.

Then this pseudo-code calculates the dp array that we want:

f = [1,0,0...0]
for i = 1 to K:
  nf = [0,0,0,...,0]
  for j = 1 to W:
    nf[j] = 0
    for t = 0 to c[i]
      nf[j] |= f[j - t * x[i]]
  f = nf

Well, this obviously takes O(PWKN) time but we can easily get rid of the third loop. Key insight is this observation, during any change for array nf, j x[i] = (j + t * x[i]) x[i]. Which means if we group the integers according to their remainder each group is independent with each other. That’s why we can calculate them separately.

Here is the implementation for more details :

Seriously what is up with the square-freeness :stuck_out_tongue:

And it is a shame that O(PWK) does not pass but O(PWKlogN/32) passes.

1.5 seconds time limit is too harsh, considering setter’s solution works with 1.45 seconds.

Do not you think it is a bad idea to set time limit (spent by setter’s solution) + epsilon even though it is a 10 days contest.


I got Accepted with O(PWK), but I had to do multiple code optimizations to get AC (and, even so, I had a runtime of 1.48 sec on one of the test cases :slight_smile: ). But, on the other hand, I had O(sqrt(N)) for the other operations (I didn’t use treaps), so I had to optimize more the O(PWK) part to compensate for the other slightly slower operations: 1) sort the K values in increasing order of value * count(value); 2) stop the DP if we formed sum W (obvious); 3) if value * count(value) >= W then you can ignore the count and do DP as if count = infinite; 4) iterate only up to the max sum formed so far

I’m also disappointed that the square-freeness was bogus :slight_smile: I was looking forward to some fancy mathematical properties of the knapsack problem when all the values are square-free :slight_smile: On the other hand, Googling this didn’t provide anything useful, so I had suspected the square-freeness was useless.


Well I tried all of these individually but it seemed to increase the time taken by my code ( weird ).

I didn’t try all of them together in one code though.

Also I had used treaps so Q1,Q2 were log N per query.

Can you check out my code and tell me how to further optimise it?
It TLEs on one test :confused: ( Link given in above comment )

Three small optimizations for the DP part:

  1. Don’ t iterate up to w in each iteration - just up to the currently maximal possible value (sorting the weights by maximal obtainable value may further decrease runtime)

  2. The first iteration can be done directly without iterating the whole array.

  3. Similar to 2) the last iteration can be replaced by just looking at the values w- k*w_i.

1 Like

Movers and Packers Jaipur @
Movers and Packers jamnagar @

Can’t we use segment tree to solve this one

for first and third query it will take O(log n) time and for second query it will take O(X+logn) X=number of elemnets to be reversed

I suppose this part is wrong O(X+logn) . Because in the worst case lets suppose you will want to reverse
n elements than you again have to rebuilt the whole tree . which can take O(n+nlogn) . So according to me second query could be solved in O(X+XlogX) with segment tree approach .

The sample code in the editorial computed dp[] in reverse order for each summand. (which is the right thing to do here in this JOB )

But, in the provided code, we have something:

for(int j = 0; j <= limit - complete_shifts; j++)

It seems it is doing the JOB opposite.
I am confused if its really working in opposite , or I am missing something.
Any help would be greatly appreciated.

I missed the fact that there are two arrays: reachable_sum[] and shifted_mask[] , which are independent.
As we are using shifted_mask[] for a loop and then updating r_sum[] ; the above loop direction MATTERS here here :slight_smile: and its logical.