PROBLEM LINK:
Author: Roman Rubanenko
Tester: Anton Lunyov
Editorialist: Anton Lunyov
DIFFICULTY:
MEDIUM
PREREQUISITES:
Birthday Paradox, Periodic Sequences, Associative array
PROBLEM:
Getting rid of the storyline we can formulate the problem as follows. You are given some starting N × M grid G_{0} of zeros and ones and some function f on the set of such grids. Namely, f(G) is the grid obtaining from the grid G after the Loda’s move. Create a sequence of grids G_{n} by the formula G_{n} = f(G_{n − 1}) for n ≥ 1. The task is to find CNT(G_{0}) + CNT(G_{1}) + … + CNT(G_{T}) for several values of T, where CNT(G) is the number of ones in the grid G.
QUICK EXPLANATION:
The main thing here is to believe that in the sequence {G_{n}} repetition occurs relatively early. That is, the first value of p such that G_{p} = G_{q} for some q < p will be relatively small so that the simple brute force will allow to find it in a reasonable time. Since G_{n} = f(G_{n − 1}) it would mean the periodicity of the sequence {G_{n}}. Namely we will have
**G_{n + p} = G_{n + q}** for **n ≥ 0**. It allows to answer the queries in **O(1)** time after precomputing the sums **CNT(G_{0}) + CNT(G_{1}) + ... + CNT(G_{n})** for **n ≤ p**.EXPLANATION:
For those who don’t really get this a bit advanced formulation of the problem, let’s consider an example from the problem statement. The sequence {G_{n}} in this case starts as follows:
G_{0}  =  100 000 000  ;  G_{1}  =  111 000 000  ;  G_{2}  =  101 111 111  ;  G_{3}  =  011 101 010  . 
I hope it should be clear now what this math thing G_{n} = f(G_{n − 1}) means.
Strictly speaking we don’t have a formal proof why the first repetition occurs relatively early. The intuitive reasoning is that the function f(G) transforms the grid in some random fashion, so we can approximately consider the sequence G_{n} as a sequence of random grids of zeros and ones. There are S = 2^{N * M} such grids in all. According to the birthday paradox among 3.5 * sqrt(S) random grids repetition occurs with probability about 99.8%. For sizes 2 × N, N × 2, 1 × N and N × 1 sequence G_{n} usually has lower index of the first repetition. So the toughest sizes under the given constraints are 3 × 11 and 11 × 3. Trying a lot of initial data (Sx, Sy, P1, P2) for these sizes with different random distributions the hardest test we were able to create has the first repetition at the position near 324000. Note that S = 2^{33} in this case, so 3.5 * sqrt(S) is also near 324000. It means that our intuitive reasoning seems to be correct.
Now let’s discuss the implementation of this solution.
For convenience we will numerate rows and columns of the grid starting from 0. We can represent grids as numbers having at most N * M bits in binary representation. Namely, if the cell (i, j) of the grid G is G[i][j] where 0 ≤ i < N and 0 ≤ j < M, then the corresponding number is the sum of G[i][j] * 2^{M * i + j} over all i and j. So we can now consider sequence of grids {G_{n}} as a sequence of numbers. We call such numbers bitmasks.
To find the first repetition in this sequence we need some data structure that allows to check quickly after calculating each G_{n} whether we have already such number in the sequence and in what position. The most efficient in this situation will be Hash Table that allows to answer this question in O(1) time in average with high probability. It is implemented in both author’s and tester’s solutions. Alternatively you can use any fast associative array data structure in the language of your choice. For example in C++ you can use STL container map. It allows to answer this question in O(log n) where n is the index of the current element of the sequence.
The time limit was moderate so to calculate f(G) for some grid G represented as a number you can simply restore the grid by the number, make directly all calculations described in the problem statement and calculate the number for the new grid. There is only one thing you should additionally notice. The new grid should be created in the new variable since otherwise you can’t make appearing of new images for all cells simultaneously. All other bits of making move can be performed directly as described in the problem statement. In this way f(G) can be calculated in O(N * M * (N + M)) with quite large constant. Probably not all implementations of f(G) could pass the time limit since in our intended solution this step is made in O(N * M) time. But only a few tries to optimize the direct implementation will give you a solution that can pass the time limit. Well, in fact we would like all such solutions to fail but the slowness of Pyramid (archaic SPOJ server where all your solutions are run) not allow us to set higher constraints.
To get the O(N * M) algorithm for calculating f(G) you need to do just three things.

For each cell (i, j) and the number r from 0 to 5 precompute the bitmasks of cells which appear from the cell (i, j) if F(i, j) mod 6 = r.

For each cell (i, j) precompute the bitmask of the cells in the cross of the cell (i, j).

Precompute bitCount (the number of ones in binary) for numbers up to 2^{17}. This allows to find bitCount for numbers up to 2^{34} as a sum of two stored values for the first half of the number and for second half of the number.
Now the move can be made in a simple O(N * M) loop. See tester’s solution as a reference. Namely, F(i, j) is just bitCount(grid and
cross[i][j]). So to make move we need to iterate over all cells check whether we have image in this cell and xor
the resulting grid with corresponding bitmask of new cells which appear from this cell.
Another speed up that can be made is the precomputing residues modulo 6 for values up to 34. It seems ridiculous at the first sight but the modulo operation even modulo 6 considerably slower than array lookup. So this hack speeds up the solution in about 15%. It is used in author’s solution.
Now assume that we find values p and q such that G_{p} = G_{q} and 0 ≤ q < p. As was mentioned in the QUICK EXPLANATION section let’s store the sums A[0] + A[1] + … + A[n] for 0 ≤ n ≤ p in some array sum[], where we set for convenience A[k] = CNT(G_{k}).
Now assume that we have some value of T (possibly large) and we need to calculate the sum Sum(T) = A[0] + A[1] + … + A[T]. Clearly, if T ≤ p we can simply use the stored value sum[T]. If T > p we will use the periodicity of the sequence {A[n]} which follows from the periodicity of the sequence {G_{n}}. For this we introduce the period w = p − q and write T as T = r + w + … + w, where q ≤ r < p. Let there be L numbers w in this representation. The sum of the first r + 1 summands in Sum(T) is equal to the stored value sum[r]. The sum of each of the following groups of length w is the same due to the periodicity and equals to the sum A[q + 1] + … + A[p] = sum[p] − sum[q]. Hence
**Sum(T) = sum[r] + L * (sum[p] − sum[q])**.In fact, L equals to floor((T − q) / w) and r = T − L * w.
AUTHOR’S AND TESTER’S SOLUTIONS:
Author’s solution can be found here.
Tester’s solution can be found here.