INVBINCF - Editorial

anton_lunyov · April 15, 2013, 4:24pm

PROBLEM LINK:

Practice
Contest

Author: Anton Lunyov
Tester: Hiroto Sekido
Editorialist: Anton Lunyov

DIFFICULTY:

HARD

PREREQUISITES:

Binomial coefficients modulo prime powers, 128-bit arithmetic

PROBLEM:

For the given n and R such that 0 ≤ R < 2ⁿ you need to find the smallest K such that C(2ⁿ − 1, K) mod 2ⁿ = R,
where C(N, K) = N! / K! / (N − K)!, or report that such K does not exist. Note, that n could be up to 120.

EXPLANATION:

At first note that the problem requires to code 128-bit arithmetic. This can be made by representing each number as a pair of 64-bit values (lowest 64 bits and highest 64 bits) and then some routine job should be made to code all necessary operations. The only non-trivial one is multiplication. It can be made using only six 64-bit multiplications (refer to the tester’s solution). Actually only five is enough (refer to the author’s solution).

The first thing to notice is:
C(2ⁿ − 1, K) = fact2(2ⁿ − 1) / fact2(K) / fact2(2ⁿ − 1 − K) * C(2ⁿ⁻¹ − 1, K div 2), (1)
where fact2(N) is the product of all odd numbers up to N. (If some one want to see the proof ask me in comment I and will provide it when I will have a free time.)

It immediately follows from here that C(2ⁿ − 1, K) is always odd. So if R is even then the answer is −1.
We will prove that for each odd R in the range {0, 1, 2, …, 2ⁿ − 1},
there exists unique K in the range {0, 1, 2, …, 2ⁿ⁻¹ − 1} such that C(2ⁿ − 1, K) mod 2ⁿ = R.

At first we infer the following congruence from (1):
C(2ⁿ − 1, 2 * K + X) ≡ (−1)^{K + X} * C(2ⁿ⁻¹ − 1, K) (mod 2ⁿ), where X is 0 or 1.
(Again if some one want to see the proof ask me in comment I and will provide it when I will have a free time.)

Also we will need the following property:
C(2ⁿ − 1, 2 * K) + C(2ⁿ − 1, 2 * K + 1) ≡ 2ⁿ (mod 2ⁿ⁺¹), for n ≥ 2 and 0 ≤ S < 2ⁿ⁻¹.

Then after some non-trivial analysis we get the following simple iterative routine that will find the answer:

INVBINCF(n, R)
   K = (R mod 4) div 2
   for j = 3 to n do
      par = K mod 2
      if C(2^(j-1) - 1, K) mod 2^j != R mod 2^j
         K = K xor 1
      K = 2 * K + par
   return K

The proof in Russian is contained here (pages 57-58) and contains a lot of formulas. If you really want to see the proof in English, ask me in comment and maybe some day I will translate it

Thus, the problem boils down to effective calculation of C(2ⁿ − 1, K) modulo powers of two.
And here we ask for help of British mathematician Andrew Granville, the number theory expert
We refer to Theorem 2 of his article about binomial coefficients modulo prime powers (see end of the page 3).
For p = 2 it sounds as follows:
fact2(2 * u) ≡ sgn * Product[ fact2(2 * j)^{b(r, j, u)} : 1 ≤ j ≤ r] (mod 2^{2 * r + 1}), for r ≥ 2, (2)
where
b(r, j, u) = u / j * Product[ (u² − i²) / (j² − i²) : 1 ≤ i ≤ r, i ≠ j],
and sgn is 1 or −1 and can be found by comparing residues of LHS and RHS modulo 4.
Clearly b(r, j, u) could be calculated modulo 2^{2 * r − 1} in view of the property:
a^{2^{n − 2}} = 1 (mod 2ⁿ) for n ≥ 3 and odd a.
The proof of (2) is based on 2-adic exponent and logarithm and is like magic for me.

But straightforward implementation of C(2ⁿ − 1, K) using formulas (1) and (2), together with INVBINCF routine above leads to quite slow solution.

Note, that actually we can maintain value C(2^j − 1, K) mod 2ⁿ in the loop of INVBINCF and recalculate it using formula (1). Namely, if K = K xor 1 is performed we actually replace K by adjacent integer and can update C(2^j − 1, K) straightforwardly using one multiplication and one division.
While when we do K = 2 * K + par we should transform C(2^j − 1, K) to C(2^{j + 1} − 1, 2 * K + par) and this exactly can be made by formula (1) by calculating three values of fact2(X) by formula (2). Also we can maintain numerator and denominator of C(2^j − 1, K) separately and get rid of divisions by rewriting check in INVBINCF in obvious way. Refer to routine invbin in author’s solution for clear implementation of this.

So INVBINCF is ready to be used but we need to figure it out how to calculate fact2(X) mod 2ⁿ efficiently.

At first we should calculate b(j, r, u) mod 2ⁿ⁻² efficiently. Here main complication is that denominator could have even factors and we can’t directly apply technique of inverse residues. To handle this we will use extended representation of integers in the form 2^a * b, where b is odd. So we could create a data structure that deals with this representation. Involving negative a and inverse residue technique for odd part of the number, we can handle non-integer numbers as well. Thus, to calculate b(r, j, u) we use the following plan:

Before processing test data we precalculate values pseudo_fact[r][j] = 1 / j * Product[ 1 / (j² − i²) : 1 ≤ i ≤ r, i ≠ j] in extended format. They equal to inverse of denominators of b(r, j, u).
Inside fact2 routine we calculate values pref[k] = u * Product[ (u² − i²) : 1 ≤ i ≤ k] and suf[k] = Product[ (u² − i²) : k ≤ i ≤ r] in two simple O® loops (again in extended format).
Then b(r, j, u) = pref[j - 1] * suf[j + 1] * pseudo_fact[r][j] and after this, we can transform it to usual integer representation.

Thus, the complexity of precalc is O(N³) (we calculate O(N²) values pseudo_fact[r][j] and each step requires finding inverse residue, which can be made in O(N) time) and calculation of all values b(r, j, u) for the given u is only O(N). Actually, saving inverses for numbers up to N in O(N²) time we can speed up the precalc part to O(N²), but even O(N³) precalc is many times faster then some tricky precalc below.

Now having values b(r, j, u) calculated we can find fact2(u) using r PowerMod operations, where PowerMod[a, b, n] returns a^b mod 2ⁿ. But even using exponentiation by squaring this will lead to O(N²) complexity for the fact2 routine and thus to O(N³) complexity for the INVBINCF routine, which is still too slow.

To speed up this part we can do the following. Note that we need to calculate powers of small fact2-values in fact2 routine. So we can do some clever precalc for them. Namely, if we want to set up fast calculation of powers of some number A we can precalc the following its powers in advance: powA[k][j] = A^{j * 2^{H * k}} for all 0 ≤ k < N/H and 0 ≤ j < 2^H, where H is some constant (divisor of N = 120 will be most convenient). Then, when we want to calculate A^B for some B < 2^N, we represent B in 2^H-ary system as
B = B₀ + B₁ * 2^H + … + B_K * 2^{H * K}
and calculate A^B as the product of known powers powA[0][B0] * powA[1][B1] * … * powA[K][BK] in O(N / H) time.

Thus precalculating such powers for each fact2(2 * j) (1 ≤ j ≤ 60) with constant H = 15, will speed up our solution in H = 15 times to O(N * N * N / H). But we can do even better. We can find prime factorization of fact2(u) and set up powers precalc only for odd primes up to 120 (there are only 29 of them). Then after calculation of b(r, j, u) we can do some simple O(N) loop to calculate degrees in factorization of fact2(u) and then calculate the answer as the product of prime powers in at most 29 * 8 multiplications. This trick also speeds up powers precalc in two times.

Overall complexity of the solution assuming constant time for each 128-bit operation is
O(N³ + π(N) * 2^H * N/H + T * N * (N + π(N) * N/H)),
where N = 120, H = 15 and π(N) is the number of primes ≤ N. Here:

term N³ corresponds to precalc of pseudo_fact[r][j];
term π(N) * 2^H * N/H corresponds to precalc of prime powers;
term T * N * (N + π(N) * N/H) means that for each of T tests we calculate O(N) values fatc2(u) in O(N + π(N) * N/H) time each.

You may also ask “Why did I annoy all by crappy 128-bit arithmetic?”. This is because I almost invented more elementary solution with O(2^N/3) precalc and poly(N) routine for each test. So I decided that staying under 64-bit arithmetic could be dangerous. Actually I spent the whole day to code the working 128-bit version having working 64-bit version. So this annoyed me too

ALTERNATIVE SOLUTION:

@winger solution is very interesting and differs from described one.
It would be nice if he will describe his solution.

Solution by @mugurelionut is based on more elementary math than Granville formula. You can read about it here.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:

e-olimp - 322 - Binomial Coefficients 5
The current problem is extreme version of this one.
I’ve set this problem (for n ≤ 40) at Kharkiv Winter Programming School 2009 and editorial in Russian is available here (see page 57). So those who knew about this problem could have some advantage against others, but it seems that all who solved the problem was unaware of this link

The good problems to practice your skills on direct problems on calculating reduced factorials modulo prime powers:
Mathalon - 141 - Factorial trailing digits2
Mathalon - 144 - Binomial Coefficient 1 (this one is mine :P)

argos · April 15, 2013, 5:21pm

I used the same mathematics based on Granville formula adapted for 2^n. But my solution need to be speed up to 100 times. So I decided, that you invented something cool for this this problem, but I was wrong =(

amitupadhyay · April 15, 2013, 5:47pm

though I was not able to solve the problem with some analysis I found out some facts

For R=even no. The answer is always -1
R=1 then k=0
R=2ⁿ-1 then k=1
R=2^n-1+1 then k=2
R=2^n-1-1 then k=3
R=2^n-2+1 then k=4
R=3.2^n-2-1 then k=5
R=3.2^n-2+1 then k=6
R=2^n-2-1 then k=7
R=2^n-3+1 then k=10
R=7.2^n-3-1 then k=11
R=3.2^n-3+1 then k=12

@anton_lunyov correct me if I am wrong.

I was unable to form a series or any generalised formula. but my analysis gave me above relations. May be they help in making the program

mugurelionut · April 15, 2013, 6:02pm

Can you estimate the time complexity of the official solution ? (per test case, as a function of n)
Editorialist reply: Done. Check the editorial.

I am asking because at some point I had an O(n^3) per test case solution and was too slow (I mean O(n^3) 128-bit operations). I had to come up with extra ideas in order to cut the time complexity down to O(n^2 * log(n)) per test case (note that I don’t mean base 2 logarithm). Of course, the time complexity per test case is not all that matters - preprocessing should be counted, too (my solution had O(n^3) preprocessing).

During the contest (while searching the web in the hope of finding useful mathematics to use for solving the problem) I did come across the e-olimp problem you mentioned (with n <= 40). I didn’t know there was any editorial available and I noticed that Anton was the only one with an accepted solution at that problem on e-olimp Still… I though that Russian or Ukrainian competitors would have an advantage because of that problem… it seems that I was wrong.
During the contest I did come across Granville’s paper (among many other papers). But I was looking for simpler equations/formulas so I automatically skipped Theorem 2, which looked rather complicated to me (so I thought it wouldn’t be useful) In the end, I used only elementary mathematics in my solution (and I had to prove everything myself from scratch, except for how to compute the modular multiplicative inverse of an odd number modulo 2^n). I think @scli used the property from Granville’s paper (I browsed his solution earlier and I noticed a i * i - j * j somewhere : I didn’t understand it then, but now it’s starting to make sense)

amitupadhyay · April 15, 2013, 6:22pm

@mugurelionut Can you explain me what you actually did in your solution i am not able to get it

mugurelionut · April 15, 2013, 7:36pm

@amitupadhyay: I will explain my solution a bit later, when I find some time. I can tell you now what the most important values that I computed were: 1) PowerSum(i,j) = the sum of 1^j + 2^j + 3^j + … + (2^i)^j (i.e. the sum of all the numbers from 1 to 2^i, each at the j^th power) (i,j<=120) 2) SumOfSubsetProducts(i,j)=the sum of the products of the numbers of each j-element subset of the numbers 1,…,2^i, and 3) SPODD(i,j)=the sum of the products of the numbers of all the (2^i - j)-element subsets of the odd numbers 1, 3, …, 2^i - 1.

anton_lunyov · April 15, 2013, 8:54pm

Check it out the updated part about power calculation. Maybe you change your mind. Actually phrase “you invented something cool for this problem, but I was wrong” is a bit offensive for me, since it seems to indicate that I can’t figure it out anything cool

anton_lunyov · April 15, 2013, 8:59pm

My old solution to e-olimp problem deals with similar sums but I can’t do better than O(2^(n/2)) precalc + O(n) for each test. It will be really interesting to see how this approach could be developed to get polynomial solution.

amitupadhyay · April 15, 2013, 9:52pm

@mugurelionut thanks for the reply I think I will figure it out now

argos · April 16, 2013, 3:49am

It is not shame upon you, it is shame upon me. Idea was standart (except Granvill, but I read his paper) and I could do it, but I decided stop optimizing and tried find some another approach. In any case thank you, I knew more interesting math staff during this challenge.

anton_lunyov · April 17, 2013, 1:58pm

I didn’t check the patterns but unlikely this could lead to a complete solution.

mugurelionut · April 18, 2013, 5:07am

I will try to explain the main ideas in my solution for this problem. Actually, my solution is quite similar to the general idea described in the editorial - the major difference consists of the way I compute F2(X) = the product of all the odd numbers <= X.

Let’s consider the binary representation of X and let’s traverse its bits from the most significant one to the least significant one. Thus, let’s assume that X = 2^p(1) + 2^p(2) + … + 2^p(q) (where p(1) > p(2) > … > p(q)).

The first bit is handled in a special manner. We want to compute the product of all the odd numbers 1 * 3 * … * (2^p(1) - 1). Actually, this value will be precomputed and I will denote it by SPODD(p(1),0) (I will explain later what SPODD(i,j) means).

Let’s assume now that we reached the bit p(r >= 2) and let’s denote by Y = 2^p(1) + 2^p(2) + … + 2^p(r-1) (i.e. the sum of all the bits of X which are more significant than p®). We now want to compute the product of all the odd numbers in the range (Y + 1) … (Y + 2^p®). If p® is 1 or 0 then the product consists of just one number: (Y+1).

Otherwise (if p® >= 2) then the product will be (Y + 1) * (Y + 3) * … * (Y + 2^p® - 1) (there are 2^(p®-1) odd numbers in the product). This product can be expressed as:

Y^0 * (the product of all the odd numbers 1 * 3 * … * (2^p® - 1)) +
Y^1 * (the sum of all the products u(1) * … * u(2^(p® - 1) - 1), where {u(1), …, u(2^(p® - 1) - 1)} goes over all the subsets of the odd numbers {1,3,…,2^p® - 1} having 2^(p® - 1) - 1 elements) +
… +
Y^j * (the sum of all the products u(1) * … * u(2^(p® - 1) - j), where {u(1), …, u(2^(p® - 1) - j} goes over all the subsets of the odd numbers {1, 3, …, 2^p® - 1} having 2^(p® - 1) - j elements) +
… (the sum contains 2^(p® - 1) + 1 terms overall).

Now I can introduce the values SPODD(i,j) = the sum of all the products u(1) * … * u(2^(i - 1) - j), where {u(1), …, u(2^(i - 1) - j} goes over all the subsets of the odd numbers {1, 3, …, 2^i - 1} having 2^(i - 1) - j elements. Thus, SPODD(i,j) means that we consider all the subsets of odd numbers < 2^i having 2^(i-1) - j elements. Obviously, by the definition, SPODD(i,0) = 1 * 3 * … * (2^i - 1).

Our sum of terms mentioned earlier (Y^0 * … + Y^1 * … + …) can be expressed now as Y^0 * SPODD(p®, 0) + Y^1 * SPODD(p®, 1) + … + Y^j * SPODD(p®, j) + …

Let’s notice now that we only ever need at most 120 terms from this sum, when computing it modulo 2^120. Note that Y is an even number. Thus, only the terms corresponding to Y^0, Y^1, …, Y^119 may have a non-zero remainder when divided by 2^120 (all the other terms are 0 modulo 2^120). Actually, we can do even better. When we reached the bit p® (r>=2), Y is an even number which is a multiple of at least 2^(p® + 1) (because Y=2^p(1) + … + 2^p(r-1) and p(r-1) > p®). Thus, only the terms corresponding to Y^0, Y^1, …, Y^k may be non-zero (mod 2^120), where k = 119 / p(r-1).

When we add things up, we notice that, in the worst case, we will need around 119/119 + … + 119/3 = O(n * log(n)) 120-bit operations (n = 120) for computing F2(X) (I stopped at 119/3 because we only use this method for p®>=2 and, thus, p®+1>=3 ; for p®=1 or 0 I explained earlier that the result is easy to compute).

Since we evaluate O(n) different F2(X) values per test case, this adds up to an O(n^2 * log(n)) time complexity per test case.

For now I left out a very important part - how to precompute all the numbers SPODD(i,j) (i,j <= 120) in O(120^3) time. I will explain this at another time, as the explanation is more complicated than anything I wrote here.

amitupadhyay · April 18, 2013, 6:33am

@mugurelionut thanks got it completely nw