SHUFFLE2 - Editorial




Author: Sergey Kulik
Tester: Suhash Venkatesh and Kevin Atienza
Editorialist: Kevin Atienza


Splay trees, treaps, permutations, chinese remainder theorem


A deck of N cards is shuffled using overhand shuffle. Specifically, the shuffle consists of M cuts repeated Q times, where in the i th cut the cards from position L_i to R_i are moved to the beginning.

You have to answer two kinds of questions:

  • Given Q, what is the final order in which the cards appear in the end?
  • Given a final order, compute the smallest Q for that order to occur. (it’s guaranteed that this Q exists)


The first part of the solution is to compute the permutation after performing the M cuts once. Each cut [L_i,R_i], 1 \le L_i \le R_i \le N, can be decomposed into a series of prefix reversals:

  • Reverse [1,L_i-1]
  • Reverse [1,R_i]
  • Reverse [1,R_i-L_i+1]

Prefix reversals can be done efficiently using splay trees with lazy propagation, with running time O(N + M \log N).

The next step is to decompose the permutation into disjoint cycles.

Next, let’s answer the two questions:

  • Given Q, what is the final permutation? In other words, what is the Q th power of the given permutation? To do this, we simply raise each cycle to the power Q. Raising a cycle to any power requires only rotating the cycle a given number of times, which takes linear time in the size of the cycles. Therefore, this runs in O(N) time.

  • Given the final permutation, what is Q? We can reverse-engineer the approach to the previous question. Given any cycle of size S_i, we check what happens to it in the final permutation and derive which Q s could have resulted in that outcome. You’ll find that all such Q s satisfy a congruence relation Q \equiv k_i \pmod{S_i} for some value k_i. For each cycle you have one congruence relation. Therefore, we can use the Chinese remainder theorem to derive a single congruence Q \equiv k \pmod{S}, for 0 \le k < S and S is the order of the permutation. The minimum Q is simply Q = k, and this also runs in O(N) time.

Note that in the second question, the moduli are not necessarily pairwise coprime, so you may have to adjust your algorithm to combine two congruences.


A cut operation moves a sequence of contiguous cards from the middle of the deck to the beginning. More specifically, \text{cut}(L,R) moves the cards at positions [L,R] to the beginning.

At first, you might consider finding some special/useful properties of permutations made from cuts, because a single cut seems well-behaved and nice. However, we can show that any such properties won’t be very useful for us, because any permutation can be arrived at with a series of cuts. In particular, restricting ourselves to cuts where L = R, we can only need at most N such cuts to turn any permutation into any other!

This hints to us that we don’t have to study cut permutations at all. We just need to compute the final permutation obtained by doing all the M given cuts. But performing M cuts naïvely is time-consuming, so we need to be efficient about it.

Performing many cuts

We can actually reduce \text{cut}(L,R) into a series of three prefix reversals:

  • Reverse [1,L-1]
  • Reverse [1,R]
  • Reverse [1,R-L+1]

For example, suppose we want to perform \text{cut}(4,7) on abcdefghij. According to the above, \text{cut}(4,7) corresponds to the following sequence of prefix reversals: [1,3], [1,7], [1,4].

 a b c d e f g h i j
[1   3]
 c b a d e f g h i j
[1           7]
 g f e d a b c h i j
[1     4]
 d e f g a b c h i j

Sure enough, the 4 th to 7 th characters, defg, were moved to the beginning!

Thus, we need to perform 3M prefix reversals efficiently. This can be achieved for example using randomized treaps or splay trees. We’ll discuss how to do it with splay trees.

First, we create a balanced splay tree containing N nodes. The i th node in the inorder traversal of the tree initially contains the value of the i th card (in the initial permutation). Each node also contains its size (in effect representing the size of the interval it represents). In addition, each node contains a special attribute called reverse, which is true if the interval being represented by the node is reversed.

This reverse variable will be lazy, i.e. we will only actually perform the reversal when we pass through the node, and even then, we won’t reverse all subtrees, rather we will just swap subtrees and propagate the reverse value to the two subtrees. For this reason, the reverse attribute will be the key to making our prefix reversals efficient.

Let’s define a visitation of a node. To visit a node means to propagate its reverse attribute downward, i.e. if reverse is true, set it to false, swap its two subtrees, and then flip their reverse attributes. We can now discuss how to reverse a prefix [1,x]. We’ll consider the case x = 1 and x = N specially, since they are much easier to handle.

  • If x = 1, nothing needs to be done :slight_smile:
  • If x = N, then just flip the reverse attribute of the root, and visit it.

Now, for the general case. First, splay on the $(x+1)$th node in the inorder traversal. Here are a couple things to keep in mind during the splay:

  • The i th node in the inorder traversal can be found efficiently using the size attributes. They determine which subtree to recurse to. For example, if the left subtree contains less than i nodes, then recurse to the right subtree (and adjust i).
  • During the traversal, don’t forget to visit the node first.
  • Also, during the splay operation, don’t forget to update the size attributes during rotations.

After the splay operation, the $(x+1)$th node is now the root, which means its left child contains exactly x nodes. So reversing the first x elements just amounts to flipping the reverse attribute of the left child of the root!

After performing the 3M prefix reversals, we can perform an inorder traversal of the tree to extract the final permutation. Just keep in mind to visit each node during this traversal also.

The time complexity of each prefix reversal is amortized O(\log N), so the whole algorithm runs in O(N + M \log N). (O(N) for the construction and the extraction at the end.)

Questions about permutation

Now that we have the final permutation, we can now try answering the questions. Let’s tackle the first question:

Given Q, what is the final order in which the cards appear in the end?

Composing two permutations together isn’t hard in itself and can be done in O(N), but composing Q permutations will not do, because Q is too large.

We can employ some sort of fast exponentiation to compose a permutation to itself Q times. This runs in O(N \log Q) time which might pass the time limit. But we can instead describe a much faster solution that runs in O(N) time.

Suppose we want to repeatedly apply some permutation \pi to the list [1,2,\ldots,N]. Let’s analyze what happens to the number i. After one application, i ends up at position \pi(i). After another, \pi^2(i). After another, \pi^3(i), and so on. The question is, what is the long-term behavior of the sequence

i, \pi(i), \pi^2(i), \pi^3(i), \pi^4(i), \ldots

? Clearly, this sequence should repeat at some point because there are at most N values in it. So let’s say \pi^a(i) = \pi^b(i) for some a \le b. But since \pi is a permutation, it must be a bijection, so \pi^{a-1}(i) must be equal to \pi^{b-1}(i). In the same way, \pi^{a-2}(i) = \pi^{b-2}(i), and so on, until we find \pi^0(i) = \pi^{b-a}(i). But \pi^0(i) is just i. Therefore, we have just shown that i appears again in the sequence (specifically, at \pi^{b-a}(i)), and so this sequence must be periodic!

This is true for any value i, so we say that every integer in i belongs to a cycle in a permutation (if you follow where i goes as you apply the permutation repeatedly). Thus, any permutation in fact consists of a bunch of disjoint cycles.

We can write a cycle of length k as a sequence of integers i_0, i_1, i_2, \ldots, i_{k-1}, where i_a goes to i_{a+1} for 0 \le a < k-1 and i_{k-1} goes to i_k. The nice thing about disjoint cycles is that you can apply them in any order and the result is still the same. Thus, applying a permutation Q times means applying each of its cycles Q times.

But it’s easy to apply a cycle Q times! Applying a length-k cycle Q times moves i_a to i_{(a+Q)\bmod k}. Thus, we can now compute the final permutation in O(N) time, by just doing this for each cycle!

Now, we proceed to the second question:

Given a final order, compute the smallest Q for that order to occur. (it’s guaranteed that this Q exists)

To answer this, we must reverse-engineer our last approach to the previous question. We want a Q such that applying the permutation Q times results in the given final order. First, we collect the cycles in the original permutation. Then, focus on one cycle, say of length k. Let i_0 be the first element. Determine its position in the final order, say i_a. Then for Q to be valid, the following must be true: Q \equiv a \pmod{k}. Doing this for all cycles gives us a bunch of congruences that Q must satisfy, and our goal now is to find the smallest Q satisfying all these congruences.

If the moduli are all coprime, then we can repeatedly use the Chinese remainder theorem to convert our set of congruences into a single congruence, Q \equiv A \pmod{K}. Assuming 0 \le A < K, then all valid Q must be of the form A + Kq for q \ge 0, so the smallest one is definitely Q = A.

The problem is that the moduli are not necessarily coprime! We can’t use the Chinese remainder theorem (yet). Let’s try to solve the case of two congruences:

\begin{aligned} Q &\equiv a_1 \pmod{k_1} \\\ Q &\equiv a_2 \pmod{k_2} \end{aligned}

Let g = \text{gcd}(k_1,k_2), and k_1 = gm_1 and k_2 = gm_2. Reducing both congruences above modulo g, we get:

\begin{aligned} Q &\equiv a_1 \pmod{g} \\\ Q &\equiv a_2 \pmod{g} \\\ \Rightarrow a_1 &\equiv a_2 \pmod{g} \end{aligned}

If this is not true, then there is no solution Q. But the question guarantees the existence of Q, so this must be true. Let r be a_1 \bmod g, and let

a_1 = q_1g + r
a_2 = q_2g + r
Q = xg + r

Then we get:

\begin{aligned} Q &\equiv a_1 \pmod{k_1} & Q &\equiv a_2 \pmod{k_2} \\\ xg + r &\equiv q_1g + r \pmod{gm_1} & xg + r &\equiv q_2g + r \pmod{gm_2} \\\ xg &\equiv q_1g \pmod{gm_1} & xg &\equiv q_2g \pmod{gm_2} \\\ x &\equiv q_1 \pmod{m_1} & x &\equiv q_2 \pmod{m_2} \end{aligned}

The last pair of congruences have coprime moduli, so the Chinese remainder theorem applies, and we get

x \equiv q \pmod{m_1m_2}

for some q. Finally, we need to get back to Q:

\begin{aligned} x &\equiv q \pmod{m_1m_2} \\\ xg &\equiv qg \pmod{gm_1m_2} \\\ xg + r &\equiv qg + r \pmod{gm_1m_2} \\\ Q &\equiv qg + r \pmod{\text{lcm}(m_1,m_2)} \end{aligned}

Thus, we were able to combine two congruences with moduli m_1 and m_2 into a single congruence with modulus \text{lcm}(m_1,m_2)!

By repeatedly applying this method to all our congruences, we end up with a single congruence of the form

Q \equiv A \pmod{K}

where K is the \text{lcm} of all cycle lengths. Finally, we get the answer to the question as A!

There is another problem with this approach, and that is that K might exceed the 64-bit limit. In this case, we have a simple remedy: If at any point during the algorithm our modulus exceeds 10^{12}, then we can just stop, and extract the answer from the current congruence we have. Since the answer is guaranteed to be at most 10^{12}, then the current modulus is enough to determine it!

Time Complexity:

O(N + M \log N)


To be uploaded soon.

What happened whit the more detailed explanation?

Updated this one. I’ll update the others soon.