Arrays/vectors and swap should be fast on modern GHz range processors with large caches, so if you are doing it right(single array, no reallocations and just using swap and do so sequentialy to benefit from the cache), you could fit the time for most cases with your (I assume) O(N*Q) approach, but probably not for the worst case of N=Q=10^5.

Optimizing your stdin/cin input could perhaps help you beat that limit too if you feel you are close (I tested on INTEST and naive/natural cin input is as much as 50x slower than optimized version). Feel free to use my IIStream<> if on C++14 (with proper attribution as local rules demand).

Maybe stating the obvious: With naive implementation you run into the problem,that array have O(N) insertion/range swap in middle and linked lists O(N) indexed element access. You must find some way around, if the above doesn’t work.

If I read it correctly, it is basicaly taking some subarray and placing it at end or beginning. It feels, there should be some trivial solution.

I have not done it yet, but I would probably try to build a tree on top of the A[N] array. (That is root would represent whole array. left child = some left subarray and, right child = some right range of the subarray of parent and so on for the grandchilds). And then at the end the leaves should be the array in correct (shifted) order. Building the tree based on the shifts should be mostly O(Q*log(N)) - I won’t spoil for now how precisely to do that. Printing the levaves to output O(N+Q).

Maybe show us what you got yourself. That would help us to point you to inefficiencies in your code. Also there are plenty of problems - take one you can solve and return to this one with fresh mind.

I have tried to decode the nicely obfuscated C++ sample you linked. I gave up, but I am confident it is some sort of implementation of a Treap. There seem to be some submitted/accepted solutions of that that are somewhat more readable.