PROBLEM LINK:
Author: Sergey Nagin
Tester: Kevin Atienza
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza
DIFFICULTY:
Medium
PREREQUISITES:
Dynamic programming, combinatorics, number theory
PROBLEM:
How many arrays A[1\ldots N] are there such that 1 \le A_i \le M and \gcd(A_i,A_j) = 1 for all i < j?
QUICK EXPLANATION:
Every number > 1 appears at most once in the sequence. Everything else is 1.
Let S be the set of values of the sequence that are > 1. Our strategy will be to enumerate all possible sets S, and then sum up all their contributions to the total answer.
The first step is to enumerate all possible S's by enumerating all subsets of \{2, 3, \ldots, 100\} that are pairwise coprime. Actually, we only need the sizes, so this can be done very efficiently with memoization and some optimizations.
The next step is to add the contribution of each set S. We only need its size, s. There are \frac{N!}{(N-s)!} ways to select the arrangements of these values in the sequence, and everything else will be 1. So the answer is the sum of \frac{N!}{(N-s)!} among all sizes s.
EXPLANATION:
For M \le 10
For any two numbers a and b, \gcd(a,b) = 1 if and only if a and b has no common prime factors. In other words, if we define P(n) as the set of prime factors of n, then \gcd(a,b) = 1 if and only if P(a)\cap P(b) = \emptyset.
Since there are only four primes \le 10, namely \{2,3,5,7\}, we can formulate a dynamic programming solution that has approximately 2^4N states. If S is a subset of \{2,3,5,7\}, define f(n,S) as the number of sequences such that:
- There are n elements.
- Each element is in [1,M].
- \gcd(A_i,A_j) = 1 for all i < j.
- The only allowed prime factors are in S.
Clearly the answer that we want is f(N,\{2,3,5,7\}). Also, we can implement a subset of \{2,3,5,7\} as a bitmask of 4 bits, which is simply an integer from 0 to 15.
To find a recurrence for f(n,S), we enumerate all possible n th elements, from 1 to M. For a value v to be a valid last element, its set of prime factors, P(v), must be a subset of S. (Using bitmasks, this can be implemented using bitwise operators.) Afterwards, the set of allowed primes for the remaining n-1 elements is S - P(v), where “-” denotes set difference. Thus, we have the following recurrence:
For optimization, the $P(v)$s can be computed beforehand. Also, for the base case, we have f(0,S) = 1. Thus, we can now compute f(n,S) using dynamic programming! See the following pseudocode:
for n = 0..N:
for S subset of {2,3,5,7}:
if n == 0:
f[n][S] = 1
else:
f[n][S] = 0
for v = 1..M:
if P(v) is a subset of S:
f[n][S] += f[n-1][S - P(v)]
print f[N][{2,3,5,7}]
Representing these sets as bitmasks and using bitwise operators for set operations, we get the following more detailed pseudocode:
// precompute P
P[1..M]
for v = 1..M:
if v % 2 == 0: P[v] |= 1 << 0
if v % 3 == 0: P[v] |= 1 << 1
if v % 5 == 0: P[v] |= 1 << 2
if v % 7 == 0: P[v] |= 1 << 3
// compute f
f[0..N][0..15]
for n = 0..N:
for S = 0..15:
if n == 0:
f[n][S] = 1
else:
f[n][S] = 0
for v = 1..M:
if (S & P[v]) == P[v]:
f[n][S] += f[n-1][S ^ P[v]]
print f[N][15]
It’s much more obvious now how many operations this will take, so we can now see that this runs fast enough for the first subtask!
For M > 10
Unfortunately, for the remaining subtasks, M can reach up to 100, and there are 25 primes up to 100. This means that the algorithm above will have 2^{25}N states. For subtask two, you might still be able to optimize a lot and squeeze into the time limit (though it isn’t easy), but for subtask three, there’s no way this will pass. Instead, we need to find better solutions.
Solution 1
Let’s call a number big if it is greater than 1.
The first important insight is that every big number will appear at most once in our sequence. Otherwise, if x > 1 appears more than once, say at positions i and j, then \gcd(A_i,A_j) = x \not= 1, which violates our requirements. This automatically means that there are at most 99 big numbers in our sequence, and for large N, this means that most elements will be equal to 1!
Thus, we have two remaining tasks:
- Enumerate all sets of big numbers that are pairwise coprime, and
- For each set of big numbers, find the number of valid sequences of length N having exactly these numbers as its big numbers.
The second one is actually easy: If the set of big numbers has size s, then there are exactly {N \choose s} ways to choose the places where these big numbers will go, and then s! ways to choose which big number goes where. Everything else will be 1, thus, there are {N \choose s}s! = \frac{N!}{(N-s)!} valid sequences!
To compute N! and (N-s)!^{-1}, we need to precompute all factorials and inverse factorials up to 100000 beforehand. We can use the following simple recurrences:
All that remains is enumerating all sets of big numbers that are pairwise coprime. Let’s call such sets big sets. To begin with, let’s try a simple recursive enumeration. Let’s create a routine called all_big_sets(m, S)
, where m is the maximum allowed value in the big set, and S is the set of disallowed primes. The initial call is simply all_big_sets(M, {})
, where {}
denotes the empty set.
A straightforward implementation is the following:
def all_big_sets(m, S):
if m == 1:
yield {}
else:
// if m is not in the set
for big_set in all_big_sets(m-1, S):
yield big_set
// if m is in the set
if P(m) and S don't intersect:
for big_set in all_big_sets(m-1, union(P(m), S)):
yield union({m}, big_set)
This yield
s all big sets. In most languages, you can just collect all sets called with yield
in a giant list, like in the following:
def all_big_sets(m, S):
big_sets = []
if m == 1:
big_sets.add({})
else:
// if m is not in the set
for big_set in all_big_sets(m-1, S):
big_sets.add(big_set)
// if m is in the set
if P(m) and S don't intersect:
for big_set in all_big_sets(m-1, union(P(m), S)):
big_sets.add(union({m}, big_set))
return big_sets
Unfortunately, this doesn’t work as it is, because there are just too many big sets. So here, we will describe a few optimizations that will make things faster.
First, notice that we can do a simple optimization: instead of the sets themselves, we can just enumerate the sizes of the big sets, because the sizes are all we need.
def all_big_set_sizes(m, S):
sizes = []
if m == 1:
sizes.add(0)
else:
// if m is not in the set
for size in all_big_set_sizes(m-1, S):
sizes.add(size)
// if m is in the set
if P(m) and S don't intersect:
for size in all_big_set_sizes(m-1, union(P(m), S)):
sizes.add(size + 1)
return sizes
A second optimization is to collect the sizes as a list of frequency counts for each size, because the largest size of each big set is at most m:
def all_big_set_sizes(m, S):
size_counts[0..m] // initially all zero
if m == 1:
size_counts[0]++
else:
// if m is not in the set
counts = all_big_set_sizes(m-1, S)
for i = 0..m-1:
size_counts[i] += counts[i]
// if m is in the set
if P(m) and S don't intersect:
counts = all_big_set_sizes(m-1, union(P(m), S))
for i in 0..m-1:
size_counts[i + 1] += counts[i]
return size_counts
But still, this isn’t enough; even though the output list is now small, we still call all_big_set_sizes
too many times. Well, that’s easy to fix. We just memoize all_big_set_sizes
!
memo = {}
def all_big_set_sizes(m, S):
// check if it's in the memo
if memo has (m, S) as a key:
return memo[(m, S)]
size_counts[0..m] // initially all zero
if m == 1:
size_counts[0]++
else:
// if m is not in the set
counts = all_big_set_sizes(m-1, S)
for i = 0..m-1:
size_counts[i] += counts[i]
// if m is in the set
if P(m) and S don't intersect:
counts = all_big_set_sizes(m-1, union(P(m), S))
for i in 0..m-1:
size_counts[i + 1] += counts[i]
memo[(m, S)] = size_counts
return size_counts
But this is still not enough, because there is still too many distinct calls to all_big_set_sizes
! (For instance, there are already at least 2^{25} distinct values for the parameter S
, because it is a set of primes \le 100.)
At this point, there are two major optimizations that will allow all_big_set_sizes
to run quickly. You can implement either one of them (or even both) to pass.
Optimization 1: Primes > 50
This optimization uses the fact that primes > 50 will appear never appear with other prime factors, because if p > 50, then 2p already exceeds 100. Thus, we can try to consider these primes separately.
The first step is to create a version of all_big_set_sizes
but which only considers primes up to 50:
memo = {}
def _all_big_set_sizes(m, S):
// check if it's in the memo
if memo has (m, S) as a key:
return memo[(m, S)]
size_counts[0..m] // initially all zero
if m == 1:
size_counts[0]++
else:
// if m is not in the set
counts = _all_big_set_sizes(m-1, S)
for i = 0..m-1:
size_counts[i] += counts[i]
// if m is in the set
// Note: max_prime(m) is the largest prime factor of m
if max_prime(m) <= 50 and P(m) and S don't intersect:
counts = _all_big_set_sizes(m-1, union(P(m), S))
for i in 0..m-1:
size_counts[i + 1] += counts[i]
memo[(m, S)] = size_counts
return size_counts
The added check here is max_prime(m) <= 50
, which prevents adding primes > 50. Next, we implement the real all_big_set_sizes
, which takes into account these primes:
def all_big_set_sizes(m):
P = (number of primes in the range [51,m])
size_counts = [0]*(m+1)
counts = _all_big_set_sizes(m, {})
for i = 0..m:
if counts[i] != 0:
for j = 0..P:
size_counts[i+j] += counts[i] * CHOOSE(P, j)
return size_counts
Here, we initially call _all_big_set_sizes
above, and then take into account primes > 50 using some combinatorics. Here, j is the number of primes > 50 that we include, and CHOOSE(P, j)
is the number of ways to select these j primes out of the P possible. The function CHOOSE
itself can be computed using factorials and inverse factorials, which we will need anyway in the second step above.
Using this optimization, all_big_set_sizes
now runs fast enough to pass the time limit!
Optimization 2: Limiting the values of S
A different optimization involves looking at the parameter S
more carefully. The purpose of this parameter is to exclude primes that have already been included before. However, we are iterating m
in decreasing order, which means that the maximum prime that will possibly appear is the largest prime \le m. Thus, if there are primes in S
that are greater than m
, then we can safely remove them and the answer will stay the same! This reduces the number of distinct all_big_set_sizes
calls drastically, and in fact is fast enough to get Python accepted in under 0.5 seconds!
memo = {}
def all_big_set_sizes(m, S):
// cleanse S of primes > m
Let X = (set of all primes <= m)
S = intersection(S, X)
// check if it's in the memo
if memo has (m, S) as a key:
return memo[(m, S)]
size_counts[0..m] // initially all zero
if m == 1:
size_counts[0]++
else:
// if m is not in the set
counts = all_big_set_sizes(m-1, S)
for i = 0..m-1:
size_counts[i] += counts[i]
// if m is in the set
if P(m) and S don't intersect:
counts = all_big_set_sizes(m-1, union(P(m), S))
for i in 0..m-1:
size_counts[i + 1] += counts[i]
memo[(m, S)] = size_counts
return size_counts
The change here is in the first few lines, where a call to intersection
is done to reduce the number of elements of S
.
This solution implemented in one of the tester’s solutions below. (Sets are implemented as bitmasks.)
Solution 2
A different solution involves considering all primes > 10 specially, not just primes > 50. The insight here that we will use heavily is that each number has at most one prime factor > 10.
We will need our f(n,S) function earlier. Recall that f(n,S) is the number of valid sequences of length n where the only allowed prime factors are in S. This time, we will only precompute this table for S that is a subset of \{2, 3, 5, 7\}. (Obviously this isn’t enough because M > 10, but we will still need these values.)
The key insight here is that for every value 1 \le v < 10, the primes in the range \left(\frac{100}{v+1}, \frac{100}{v}\right] are all “functionally” identical. This is a generalization of the fact that we used earlier: that primes > 50 can be treated specially because they all appear alone. Similarly:
- Every prime p in the range (100/3,100/2] = [34,50] can only appear as p or 2p.
- Every prime p in the range (100/4,100/3] = [26,33] can only appear as p, 2p or 3p.
- etc.
Thus, really, all we care about is how many primes from each of the following ranges we take: (100/2, 100/1], (100/3, 100/2], (100/4, 100/3], etc. Afterwards, we figure out using some combinatorics where their positions will be, and which ones will appear as p, 2p, 3p, etc. And then finally, once we take care of these primes, we can now fall back to our precomputed f(n,S), because all primes > 10 have been accounted for already!
Lots of details have been skipped, but if you want to find out more, this solution is implemented in one of the tester’s solutions below, so you can check it out.