SEATSTR2 - Editorial

kevinsogo · March 13, 2016, 2:26pm

PROBLEM LINK:

Author: Sergey Nagin
Tester: Roman Furko
Translators: Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Medium

PREREQUISITES:

Combinatorics

PROBLEM:

Two strings X and Y, each of length n, are similar if they can be made equal by applying the following operation at most once in each of them.

Choose any two positions i, j (i can be equal to j) and swap the characters at those positions.

Given a string A of length n, find the number of ordered pairs (X,Y) of non-similar permutations of A, modulo 10^9 + 7.

QUICK EXPLANATION:

An equivalent definition for similar would be: X and Y are similar if you can turn X into Y by performing at most two swaps.

For every lowercase letter c, let A_c be the number of occurrences of c in A. Then the number of permutations of A is \frac{n!}{A_{\text{"a"}}!A_{\text{"b"}}!\cdots A_{\text{"z"}}!}. Let this be P.

We count the number of similar pairs of strings instead. The total number of pairs is P^2, so if S is the number of similar pairs, then the number of non-similar pairs is P^2 - S.

Let T be the number of strings similar to A. Then it can be shown that S = PT. Thus, all we need is to compute T.

Finally, to compute T, we will need to separately count four kinds of similar strings formed from A:

Strings formed from A with no swaps.
Strings formed from A with one swap.
Strings formed from A with two disjoint swaps.
Strings formed from A with two overlapping swaps.

After computing the A_c values, these counts can be computed in O(1) time each.

EXPLANATION:

String similarity and swap distance

Let’s define the swap distance between two strings X and Y as the minimum number of swaps needed to turn X into Y. We denote this by D(X,Y). Note the following facts:

D(X,X) = 0
D(X,Y) = D(Y,X)
D(X,Y) \le D(X,Z) + D(Z,Y)

We can redefine string similarity in terms of swap distance as follows: Two strings X and Y are similar if and only if there exists a string Z such that D(X,Z) \le 1 and D(Z,Y) \le 1.

But this also implies that D(X,Y) \le D(X,Z) + D(Z,Y) \le 1 + 1 = 2. (In other words, it only takes at most 2 swaps to turn X to Y, namely, first turn X to Z, then Z to Y). Thus, one necessary criterion for two strings being similar is that D(X,Y) \le 2. But is this a sufficient criterion? Actually, yes, because if D(X,Y) \le 2, then there must exist some string Z such that D(X,Z) \le 1 and D(Z,Y) \le 1, namely:

If D(X,Y) < 2, then Z = X (or Z = Y).
If D(X,Y) = 2, then Z is the intermediate string while turning X into Y with two swaps. This Z clearly is one swap distance away from X and Y.

Thus, we now have an equivalent definition for string similarity:

X and Y are similar if and only if D(X,Y) \le 2.

Thus, the problem is equivalent to counting the number of ordered pairs of non-similar permutations of A with swap distance > 2.

Observations

As usual, we’ll need a few more observations / insights to make our counting problem easier to solve.

The first observation is that it’s easier (or at least it seems to be easier) to count the number of pairs of similar permutations instead. This gives us our first observation.

Let P be the number of permutations of string A, and let S be the number of pairs of similar permutations of A. Then the number of pairs of non-similar permutations of A is P^2 - S.

To see this, note that P^2 is the number of pairs of permutations of A (because there are P choices for the first and second element of the pair), so if we subtract from it the number of similar pairs S, we get P^2 - S, the number of non-similar pairs.

Thus, we can focus on computing P and S instead. But P is easy to compute using multinomial coefficients:

**For every lowercase letter c, let A_c be the number of occurrences of c in A. Then

P = \frac{n!}{A_{\text{"a"}}!A_{\text{"b"}}!\cdots A_{\text{"z"}}!}.

**

(There is a possible source of confusion here, namely the usage of lowercase letters to denote variables and the actual characters themselves. To avoid ambiguity, we will always use quotes whenever we refer to the actual character. Without quotes, it always denotes a variable. Thus, the c in A_c is a variable, while the \text{"c"} in A_{\text{"c"}} is the actual character \text{"c"}.)

Thus, all that remains is to compute S.

Pairs of similar strings

We need to count the number of similar strings X and Y. Let’s fix the string X. Note that X has length n, and we know how many of each letter it has (from the A_c values). We are interested in finding the number of distinct strings Y that can be obtained by performing at most two swaps in X. We will need to distinguish four kinds of strings Y:

Strings formed from X using no swaps.
Strings formed from X using one swap.
Strings formed from X using two overlapping swaps.
Strings formed from X using two disjoint swaps.

In this classification, we say “swap” to mean “swap between two different letters” to avoid double counting. Since every kind of similar string is exactly one of the above kinds, we can simply count each kind separately and add the results together.

The first kind is actually very trivial, because there is only one string of that kind, namely X itself. So we’ll start on the second kind:

Strings formed from X using one swap.

How many strings Y can be formed from X using exactly one swap of different letters? Let’s count it. First, we need to select which letters to swap, say i and j. To avoid double counting, we say i < j. Next, we need to select which among the A_i and A_j occurrences of i and j, respectively, will be swapped. There are A_i\cdot A_j such choices. By summing across all (i,j) pairs, we now have our answer!

\sum_{\text{"a"} \le i < j \le \text{"z"}} A_iA_j

Strings formed from X using two overlapping swaps.

Here, by overlapping swaps, we mean, for example, swapping the i th and j th character, and then the i th and k th character after that. (Notice that two swaps happened in the i th position.) Note that we can safely ignore swaps where j = k because that has the same effect as doing no swaps at all.

Let’s understand what precisely happens during two overlapping swaps. Let’s say the i th position contains the character c_i, the j th position contains c_j, and the k th position contains c_k. After the first swap, the i th position contains c_j, the j th position contains c_i, and the k th position contains c_k. After the second swap, the i th position contains c_k, the j th position contains c_i, and the k th position contains c_j. Thus, it turns out that two overlapping swaps is equivalent to a single 3-cycle!

So it turns out we are actually counting the number of strings Y that can be obtained from X by doing a 3-cycle on three distinct characters. Let’s count it. First, we need to select the three distinct letters to 3-cycle, say i, j and k. To avoid double counting, we say i < j < k. Next, we need to select which among the A_i, A_j and A_k occurrences of i, j and k, respectively, will be permuted. There are A_iA_jA_k such choices. Finally, we need to figure out the direction of the three cycle. There are two possible directions: i \rightarrow j \rightarrow k \rightarrow i and i \rightarrow k \rightarrow j \rightarrow i. Thus, by summing across all (i,j,k) triples, we get our answer:

\sum_{\text{"a"} \le i < j < k \le \text{"z"}} A_iA_jA_k\cdot 2

Strings formed from X using two disjoint swaps.

Finally, we now try to count the number of strings that can be formed from X using two disjoint swaps. This is actually the most complicated type, because even though the positions of the swaps themselves are distinct, the letters they’re swapping might not be. For example, In the word aabc, swapping the first and third and then the second and fourth positions results in bcaa, and these two swaps are disjoint. However, the letter a was swapped in both swaps!

In fact, we need to classify this type into three subtypes. Let’s count them!

Type 1: The characters in both swaps are all distinct.

In this case, we need to select the four characters involved, say i, j, k and l. To avoid double counting, let’s say i < j < k < l. Next, we need to select which among the occurrences of i, j, k and l will be swapped. There are A_iA_jA_kA_l such choices. Finally, we need to choose which letter swaps with which. There are three possibilities: (i,j),(k,l), or (i,k),(j,l), or (i,l),(j,k). Thus, summing across all choices for (i,j,k,l), we get:

\sum_{\text{"a"} \le i < j < k < l \le \text{"z"}} A_iA_jA_kA_l\cdot 3

Type 2: The two swaps share one character in common.

In this case, we need to select the three characters involved, say i, j and k. This time, one of the characters will be different from the others, namely the character that will be shared in both swaps. Let’s say this character is i. Now, to avoid double counting, we need to let j < k. However, i's only restriction is that it must not be equal to j or k (because it is special). Finally, we need to select which among the occurrences of i, j and k will be swapped. There are A_i(A_i-1)A_jA_k such choices. Note the factor A_i(A_i-1), because there are A_i ways to choose the place that j will swap with i, and there are A_i-1 ways to choose the place that k will swap with i (which must be different from the previously chosen position). Thus, summing across all choices for (i,j,k), we get:

\sum_{\text{"a"} \le i \le \text{"z"}} \sum_{\substack{\text{"a"} \le j < k \le \text{"z"} \\\ i \not= j \\\ i \not= k}} A_i(A_i-1)A_jA_k

Type 3: The two swaps share two characters in common.

In this case, we need to select the two characters involved, say i and j. (No one is special this time.) To avoid double counting, let’s say i < j. Finally, we need to select which among the occurrences of i, j will be swapped. There are {A_i \choose 2}{A_j \choose 2} such choices (because we need to choose two positions for each letter). Note that this time, it doesn’t matter which occurrence swaps with which because the end result will be the same. Thus, summing across all choices for (i,j), we get:

\sum_{\text{"a"} \le i < j \le \text{"z"}} {A_i \choose 2}{A_j \choose 2}

Combining everything

By adding all sums above, we now have computed the number of strings that are similar to X! Let’s denote this number by T.

The cool thing about what we did is that we didn’t use any property specific to any X other than that it was a permutation of A. Therefore, the sum T is the same for every permutation X of A. Since there are P possible values of X (remember P from above?), it means that S, the number of ordered pairs of similar permutations of A, is simply P\times T!

We now all have the necessary things to compute P^2 - S, so we now have a working algorithm for the problem! Now, what about the time complexity? The first part is computing the $A_i$s which take O(n) time. Also, P itself takes O(n) time to calculate because of the factorials. Finally, S = PT, and all calculations for T depend only on the size of the array [A_{\text{"a"}}, \ldots, A_{\text{"z"}}]. But the size of this array is fixed at 26! Therefore, computing T takes O(1) time! The overall algorithm runs in O(n) time.

Don’t forget to reduce everything modulo 10^9 + 7!

The following is an implementation in Python. You can see here a technique of calculating factorials and inverse factorials modulo 10^9 + 7.

from collections import Counter

# compute factorials and inverse factorials
mod = 10**9 + 7
N = 10**5 + 11
fac = [1]*N
ifc = [1]*N
for i in xrange(2,N):
    ifc[i] = (mod - mod/i) * ifc[mod%i] % mod

for i in xrange(2,N):
    fac[i] = fac[i-1] *     i  % mod
    ifc[i] = ifc[i-1] * ifc[i] % mod

for cas in xrange(input()):
    # only the frequencies of letters matter
    A = Counter(raw_input().strip()).values()
    t = 0

    # no swap
    t += 1
    
    # 1 swap
    for i in xrange(len(A)):
        for j in xrange(i):
            t += A[i] * A[j]
    
    # 2 swaps, ab cd
    for i in xrange(len(A)):
        for j in xrange(i):
            for k in xrange(j):
                for l in xrange(k):
                    t += 3 * A[i] * A[j] * A[k] * A[l]
    
    # 2 swaps, ab ac
    for i in xrange(len(A)):
        for j in xrange(len(A)):
            if i != j:
                for k in xrange(j):
                    if i != k:
                        t += A[i] * (A[i]-1) * A[j] * A[k]
    
    # 2 swaps, ab ab
    for i in xrange(len(A)):
        for j in xrange(i):
            t += A[i] * (A[i]-1)/2 * A[j] * (A[j]-1)/2
    
    # 2 swaps, abc
    for i in xrange(len(A)):
        for j in xrange(i):
            for k in xrange(j):
                t += A[i] * A[j] * A[k] * 2

    p = fac[sum(A)]
    for i in xrange(len(A)):
        p = p * ifc[A[i]] % mod
    print (p * p - p * t) % mod

Note that Python integers are automatically arbitrary-precision, so we don’t need to worry about reducing t modulo 10^9 + 7 every time. However, in many other languages, you may need to do it to avoid overflow.

As a final note, the huge amount of nested loops in the code above can be optimized to a single loop using an extensive amount of cumulative sums and dynamic programming. See the editorialist’s solution for one way of doing it.

Time Complexity:

O(n)

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester
editorialist

ashok1113 · March 15, 2016, 3:17pm

I used the same procedure in Solution

s_verma · March 15, 2016, 3:51pm

I couldn’t figure out what is the problem in this submission… MySolution
Only got 75 points

likecs · March 15, 2016, 4:25pm

Here is my solution https://www.codechef.com/viewsolution/9601814. You may remove the comments to checkk the execution at each step.

Happy coding

yougaindra · March 15, 2016, 6:28pm

Since we need to find ordered pairs, shouldn’t number of possible pairs be P(P-1)/2 instead of P^2

glow · March 15, 2016, 9:56pm

Can you explain - “The two swaps share one character in common” part, with an example maybe ?

And total number of pairs§ should be (§(P-1))/2 as pointed by @yougaindra , right ?

Thanks.

kevinsogo · March 15, 2016, 10:17pm

In ordered pairs, (a,b) != (b,a), so they must be counted separately. Thus, P^2 is the correct count.

kevinsogo · March 15, 2016, 10:19pm

This example is from the editorial. Consider the word “aabc”. Suppose we swap the first and third positions, and then the second and fourth positions. The result is “bcaa”. These two swaps share one character in common: the character “a”. This is what I meant by “The two swaps share one character in common”.

glow · March 15, 2016, 10:28pm

Actually I meant the formula for that Case -> (Ai)(Ai−1)(Aj)(Ak) , is it like -> for ‘aabc’, ‘b’ or ‘c’ can be swapped with ‘a’ at position 0 or 1(0 based indexing) and at the same time ‘b’ or ‘c’ can be swapped with ‘a’ at 1 position . Am I correct ?

kevinsogo · March 15, 2016, 11:11pm

Yes, that is correct.

jragarwal · March 22, 2016, 9:53pm

can you explain in case of 2 disjoint swap’s second case of one common character why it has not been multiplied by 2 ?

ash_code · March 22, 2016, 10:19pm

@jragarwal in the case of two disjoint swap with one common element, lets say the characters we swap are a, b and c where a is the common one i.e. we swap (a, b) and (a, c). Also consider the frequencies of the characters a, b and c are fa, fb, fc respectively. Now, by swapping (a, b) we make fa*fb number of similar strings and for every such swap, we are number of a’s left is fa-1. So we can make *(fa-1)fc number of similar strings by swapping (a, c). So total number of similar strings in this case is *fafb(fa-1)fc. Hope this helps.

rashomon · March 22, 2016, 10:26pm

I complicated this question by trying to find the number of possible unordered pairs and then planning to multiply that by 2. Did not work

pablo2017 · March 29, 2017, 9:48am

I don’t understand why this problem is in easy level if the author says that it is medium level.

underdog_eagle · November 21, 2017, 12:34pm

Why the total number of pairs is P^2 ? It should be P * (P + 1) / 2. Right ?