PROBLEM LINK:
Setter: Bogdan Ciobanu
Tester: Misha Chorniy
Editorialist: Taranpreet Singh
DIFFICULTY:
Medium
PREREQUISITES:
Xor operation, Precomputation, Meet in the middle.
PROBLEM:
Given a sequence A_0 of length N indexed 1-based, we define an operation f on array as A_x[i] = A_{x-1}[i] \oplus A_{x-1}[i+1] \forall i \in [1, N-1] and A_{x}[N] = A_{x-1}[N].
We have to answer queries: Given T and P, answering the value of A_T[p]. Note that T \leq 10^9.
SUPER QUICK EXPLANATION
- Simple but slow solution will be to build the whole table A_x which results in O(N^2) time.
- We will precompute answer for lower bits for every position, say for 2^B values.
- Then for every query, we will skip directly over blocks of 2^B and use precomputes values to get xor-sum of values to be included from blocks block.
EXPLANATION
For this problem I’ll be discussing two solutions, one slow one to illustrate how this process actually works, followed by another slow solution, which we will, at last, optimize by precomputation.
First of all, let’s make an observation, Sequence A_x is same as A_0 if x is a multiple of power of two > N, that is, period of sequence A is M = 2^{\lfloor \log_2 N \rfloor +1}.
Proof: Reason is that, after M operations, Each elements effect on elements to their left gets neutralised because of property A \oplus B \oplus B = A. Due to this property, the value A[j] is xored in A_x[i] if, after x operations, A[j] is xored with A[i] odd number of times.
See example array 1 2 4.
0: 1 2 4 8 16
1: 3 6 12 24 16
2: 5 10 20 8 16
3: 15 30 28 24 16
4: 17 2 4 8 16
5: 19 6 12 24 16
6: 21 10 20 8 16
7: 31 30 28 24 16
8: 1 2 4 8 16
9: 3 6 12 24 16
See, last element never changes, last 2 element changes with period 2, last 3 as well as last 4 elements changes with period 4 while if N = 5, sequence repeat after 8 operations which is same as M = 2^{\lfloor \log_2 N \rfloor +1}.
First slow solution
We make a table B[x][i] where B[x][i] represent A[x][i]. We can calculate this table for x upto M where M = 2^{\lceil \log_2 N \rceil}. Since after M, the sequnce repeat, we can always take T \bmod M and answer the query. It is easy to calculate the whole matrix in O(N*M) time, which will time out.
Second slow solution
Consider a query (T, P).
Lemma: Let us consider all submasks of T when written in binary form, say mask. Now, the Final answer will be Xor sum of A[p+mask] for all submasks of T, for all valid positions (p+mask \leq N).
Proof: Consider the following table. Here ith column in Tth row being 1 means, that B[T][pos] include xor-sum of A[pos+i]. (Columns are numbered from 0).
0: 1
1: 1 1
2: 1 0 1
3: 1 1 1 1
4: 1 0 0 0 1
5: 1 1 0 0 1 1
6: 1 0 1 0 1 0 1
7: 1 1 1 1 1 1 1 1
8: 1 0 0 0 0 0 0 0 1
9: 1 1 0 0 0 0 0 0 1 1
10:1 0 1 0 0 0 0 0 1 0 1
11:1 1 1 1 0 0 0 0 1 1 1 1
12:1 0 0 0 1 0 0 0 1 0 0 0 1
13:1 1 0 0 1 1 0 0 1 1 0 0 1 1
14:1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
15:1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The first column has 1 always, Becuase A[X][P] will always include contribution of A[0][P].
After 0 operations, A[0][P] is just as given in input.
After 1 operation, A[1][P] = A[0][P] \oplus A[0][P+1].
After 2 operations, A[2][P] = A[1][P] \oplus A[1][P+1] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+1] \oplus A[0][P+2] = A[0][P] \oplus A[0][P+2]
After 3 operations, A[3][P] = A[2][P] \oplus A[2][P+1] = A[0][P] \oplus A[0][P+2] \oplus A[0][P+1] \oplus A[0][P+3]
If we simulate this process forward, we can see, that in next step, A[4][P] = A[0][P] \oplus A[0][P+4].
This works the same way as sub-masks of T.
At T = 1, Submasks of T are 0 and 1, which means, A[1][P] = A[0][P] \oplus A[0][P+1]
At T = 2, Submasks of T are 0 and 2, which means, A[2][P] = A[0][P] \oplus A[0][P+2]
At T = 3, Submasks of T are 0, 1, 2 and 3, which means, A[3][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+2] \oplus A[0][P+3]
We can see, how the contribution of A[0][P+1] disappeared, just the same way, 1 is a submask of 1, while 1 is not a sub-masks of 2.
We can use this fact to make another slow solution, where for a given query (T, P) we check for all positions X \in [P, N] and if X-P is a sub-mask of T, we take its xor-sum.
This solution requires no preprocessing, but takes O(N) time per query, making the solution run in O(Q*N) time, once again exceeding the time limit.
Faster solution
We will combine both solutions. First of all, Observe the following in above table.
0: 1 0 0 0
1: 1 1 0 0
2: 1 0 1 0
3: 1 1 1 1
We can see, that for every 4 values of T, the positions included in the range [P, P+4) remain the same because these values have a cycle of 4.
Now, If we precompute the table for the first 4 values, we can skip directly over 4 positions.
See, Suppose we have T = 5 for some position P. It will imply A[5][P] = A[0][P] \oplus A[0][P+1] \oplus A[0][P+4] \oplus A[0][P+5].
Now, see last two bits of T, when written in binary form. They are 01 which is 1.
The thing to understand here is, that A[1][P] = A[0][P]+A[0][P+1] and in same way, A[1][P+4] = A[0][P+4] \oplus A[0][P+5].
Hence, A[5][P] can also be written as A[1][P] \oplus A[1][P+4].
Hence, A[5][P] = A[1][P] \oplus A[1][P+4]. The reason it works is, that if we have calculated the table till values X, X = 2^B for some B,We have actually divided the array into blocks of size X. Lower B bits of T decide, which values to be included in the answer from every block while upper bits decide, which blocks to be included.
Suppose we have removed the lower B bits of T, now, once again, by similar reasoning, we see that values included in xor-sum will xor value from blocks which are sub-masks of T. The lower bits automatically choose the right value from blocks.
See, Let T = 14 = 1110_2. Lower B bits of T is 10 which is 2, and T after removing lower B bits is 11 which is 3. Now, We can rewrite A[14][P] as A[2][P] \oplus A[2][P+1*(2^B)] \oplus A[2][P+2*(2^B)] \oplus A[2][P+3*(2^B)].
The reason for multiplying by 2^B is that we skip directly to the next block which is at 2^B steps from the current block. For every block, A[lb][P] includes xor-sum of all values from the block of size 2^B.
Here too, we iterate over submasks of T after removing lower B bits of T, and for the submask, we include the required value from the block corresponding to that sub-mask.
Hence, we manage to solve the problem in O(N*X + Q*N/X) for X = 2^B. By choosing a suitable value of B, we can solve this problem fast enough.
For quickly iterating over a submask of a value, see the following pseudocode
for( i = mask; true; i = (i-1) & mask){
//Do something
if i == 0: break
}
Refer more about iterating over submasks here.
Related problems
A problem on very similar idea can be found here. There was another problem on very same idea in one of the past contests at codechef, i cannot find now. If anyone has link, please post in comments.
Another recent problem, which required iterating over submasks of a number along with dynamic programming, can be found here.
Time Complexity
Time complexity is O(N*(2^B)) for precomputation and O(Q*N/(2^B)) for answering queries. Choosing Value of B = 7, we can get the solution to run fast enough to fit inside the time limit.
Memory Complexity of this solution is O(N*(2^B)) for storing the precomputed table.
AUTHOR’S AND TESTER’S SOLUTIONS:
Setter’s solution
Tester’s solution
Editorialist’s solution
Feel free to Share your approach, If it differs. Suggestions are always welcomed.