Hello guys… Recently I was studying Mo’s Algorithm from this blog and trying problems stated at the end. I was trying the problem Powerful Array. I saw that my solution barely passes the Time limit and that too I don’t know how as my same solution got TLE!
I saw that the blog clearly states that Mo’s Algorithm is the best way for this problem. It would be great if someone can point where I am going wrong. Any better approach or articles from which I can learn are also appreciated!! Thanks
Here’s a link to my AC code : LINK
Same code getting TLE : LINK
@dishant_18 I resubmitted your solution with changing the data type of input array as int instead of long long and passed in 4648ms which is an improvement. There are optimisations like these which can use to spped up your code.
I didn’t read through your code fully so there might be algorithmic optimizations in your logic.
Using left shift to multiply by 2 would probably further decrease the time.
Also @dishant_18 you might try varying the block size.
One more optimization: When sorting the queries, sort the right endpoints by ascending or descending depending on whether the left endpoint is in an odd or even block. This saves a lot of travel steps for the right pointer. I cannot say exactly where I found it, but I remember it was some Codeforces comment.
UPDATE: Solution using the above optimizations takes a little over 1 sec: 33239378
From the math you can work out that the block size should be in the order of \sqrt N, but that does not mean exactly \sqrt N is the best. The best block size practically is k \sqrt N for some k, usually not very far from 1.
But as @vijju123 said, tuning the block size is not practical for short contests. You would either have to make multiple submissions or locally generate trial test data.
However the sorting order optimization always works. If you sort query endpoint in ascending order, for the first block the right pointer moves from start to end, then for next block it moves back to start and then back to end, then again back to start and back to end.
But if you alternate the sorting order as ascending/descending, then the pointer goes start to end for one block, and end to start for the next, and again start to end for the next. So nearly half as many steps are required.
PS: I’m using the terms “start” and “end” loosely here, but hopefully you understand what I’m saying.
What if the test cases have all queries such that difference between L and R is huge? Which performs better, block size of 1000 or block size of 100 ? (Hint: Greater block size means I will waste lesser iterations in jumping blocks to reach final R)
What if test cases are like, difference between L and R is not that big? Large block size can be inefficient, as if difference is less than the block size, then you are back to brute force. So a smaller block size helps here.