UPD : got accepted https://www.codechef.com/viewsolution/17055899 : i just change the implementation from dp[i][j] to h[j] with same complexity! Now, more curious to know what’s actually happening?

The sos function is the slowest part of the solution, and it outweigths the other operations done in one bucket. Thus you should call sos function less times, and do the other calculations more, so change the size of the bucket from \sqrt{Q}=512 to something bigger, like 2-3000.

I didn’t go through your entire code, but it made my solution pass, and probably it will make yours too.

That was the problem that I was facing. This is what I did :

Instead of choosing the block size ‘b’ as sqrt(q), I tried to find a value that will minimise the complexity (this is what I do usually for choosing the best block size for problems, if the TL is strict). First, we will write down the expression for the time complexity in terms of the block size ‘b’.

Lets see what are all the operations that we will be doing. We have to use SOS DP for each of the blocks. The complexity for this part is (17 * q * n)/b. Then, we need to go through all the blocks for each of the elements, and see in which block it gets killed. The complexity of this part is n * (n/b). And, if a monster gets killed in some block, you need to go through each query in the block to find exactly where it gets killed. The complexity for this part is n * b. Overall the expression for the time complexity is :

(17 * q * n)/b + n * (n/b) + n * b. If you differentiate this expression with respect to ‘b’, and equate it to 0, you will find that b = sqrt(17 * q + n). With this as the block size, the number of operations will be around 2^28.

One thing to keep in mind is that you can never be sure of a solution getting TLE based on the number of operations that you have calculated. It also depends on the type of the operations. In case of this problem, I think it was fast because there were bitwise operations involved, they are fast. I wasn’t sure that my solution was going to get accepted, but luckily it did.

Another thing is that sometimes, the worst case that you calculate might never be encountered (reasons might be you calculating the complexity incorrectly or the test cases being weak). This happened to me when I was trying to solve this problem:https://www.codechef.com/JAN17/problems/DIGITSEP/, The complexity I calculated should have given me a TLE for sure, but I was shocked to find that it was the second fastest submission for the problem. So, when in doubt, just submit and see.

I think its because of cache hits. I remember reading somewhere that in case of 2D arrays, access time is fast if consecutive accesses to memory are made to elements within the same row (it seems that this is because they are in the same block of memory). Try changing your array from dp[(1 << 18) + 2][18] to dp[18][(1<<18)+2] and let me know if it gets accepted.

If possible please explain this a little more " access time is fast if consecutive accesses to memory are made to elements within the same row." in context to this solution.(as i am using dp[j][i-1] and dp[j][i] so, how access time differs? as in both cases we need to access both the rows!).

2D arrays are stored in memory in row major order. In your previous submission, your array was :dp[mask][i]. You loop over all the masks in the i-th and i-1th layer in a single iteration of SOS DP right ? when looping over masks, you go through all the rows of your 2D array, ie., for every consecutive access to memory, the locations are 17 positions apart. But if you change it to dp[i][mask], you will be accessing memory sequentially.

Although, you do make a valid point, even I’m not sure about why its fast even though we are using the i-1th row. I’m guessing that since the entire previous row was used in the previous iteration, most of its locations are in the cache.