Can someone explain how to solve MULDIG and APRPS from JULY17?
For MULDIG see the other thread. APRPS 100 pts I would like to know myself, but APRPS 60pts is basically just http://www.mathpages.com/home/kmath111/kmath111.htm
Start with a polynom for your first root and then adept formula at the bottom to add extra your extra roots. However, even with Karatsuba multiplication this was too slow for 100pts.
Instead of Karatsuba, FFT can be used to multiply polynomials.
I’m aware of that, however, but does that work here together with the modular arithmetic? I also thought about number theoretic transform, but that doesn’t seem work here either. Likely there is a clever DP tick to exploit the all the symmetries of the problem.
APRPS can be solved using the following observation: let p(x) be the minimal polynomial for \sqrt{p_1} + ... + \sqrt{p_n}. We can prove using Galois theory that the degree of this extension over the rationals is 2^n regardless of the primes, so the minimal polynomial needs to be of degree 2^n. Moreover, we can see that p(x\sqrt{p_{n+1}})\cdot p(x+\sqrt{p_{n+1}}) clearly has \sqrt{p_1} + ... + \sqrt{p_{n+1}} as a root and has degree 2^{n+1} so it must be the minimal polynomial if it has integer coefficients. When you work out the product, you can see that all the odd terms disappear and that the remaining terms are integer.
This gives us an inductive way of calculating the polynomial. Let m = 2^n. The algorithm I found needs O(m^2) time to calculate binomial coefficients, and the recursion requires O(m) multiplications of degree O(m). When the polynomial multiplication is done naively this gives us an O(m^3) algorithm, which is enough to pass the n=10 testcase even without any optimization. One optimization you can use if your implementation is slow is writing p(x) = q(x^2) and keeping only the even powers since such a polynomial can be proven to contain only even terms. If we were to use Karatsuba for multiplication this could be improved to O(m^{1+\log_2(3)})\approx O(m^{2.585}) = O(2^{3.15n}). I suspect this would still not pass the full testcases, but I didn’t implement it. Asymptotically speaking, this approach would yield an O(m^2 \log(m)) = O(n\cdot 2^{2n}) using the discrete fourier transform for multiplication in a ring with a large enough primitive root of unity.
From Galois theory, we know that the answer is
which has degree 2^n.
As @ureino said, we can do a naive iteration to solve the subtask with N \le 10.
For larger N, we can observe that we are doing mod (10^9+7). There may be some p_i which has square root in this finite field. Since (MOD=10^9+7) \%4=3, we know that (a^{(MOD3)/4+1})^2=a^{(MOD1)/2}=(\frac{a}{MOD})a \%MOD. Here (\frac{a}{MOD}) denotes the legendre symbol.
Then for each p_i if its square exists in \mathbb{F}_{MOD}, we know that it must be p_i^{(MOD3)/4+1}.
Let the set R be the set of square root of p_i which has square root and B be the set of p_i which doesn’t have square root.
Let say there are r elements in R and b elements in B.
Then the answer is
.
Let g(x)=\prod_{k:p_k \in B}\prod_{e_k=0}^{1}(x+\sum_{p_k \in B}(1)^{e_k} \sqrt{p_k}) which is a degree 2^{B} polynomial.
Then the answer is
.
It takes total O(2^{nB}*4^{B}) to find the coefficients of each polynomial g(x+\sum_{p_k \in B}(1)^{e_k} r_k).
Then we can use fft to multiply all of these polynomial. Finding coefficients dominates the complexity so we still have O(2^{nB}*4^{B}).
You can check my code for more detail.
Remark:If we just use naive iteration, we have complexity O(4^n).
My algorithm is a sort of a generalization of what @phben wrote (since, to my understanding, their solution’s complexity appears to be O(4^n) if none of the numbers have square roots modulo (10^9+7) and I assumed the tests are better than to accept that?).
My approach was: what if we defined square roots for all possible inputs? One number that is not a square modulo (10^9+7) is 5. So let’s define an “imaginary unit” i such that i^2=5 and see if we can create a field that still behaves as we would expect. It takes some proving, but long story short, the answer is indeed, we can. Every element of this field will have the form z = a + bi such that 0 \le a, b < 10^9 + 7, and we can prove that every element z = a + 0i of the original field will have a square root in this field. We can progress as suggested by @phben except we don’t have to deal with any numbers that don’t have square roots, because now all numbers do. The complexity of the algorithm thus always drops to O(M(2^n)) where M is the complexity of the polynomial multiplication algorithm used. After some optimizations, my solution passed with Karatsuba’s algorithm, so O(3^n) in total.
@sir_ementaler that’s a brilliant idea! Instead of finding square root in \mathbb{F}_p, one can find square roots of all numbers of \mathbb{F}_p in a degree two field extension. Do you know if we can also do something similar if we consider higher root say cubic root?
(actually I wanna post a comment to your answer but I don’t know why I couldn’t)
[EDIT: I think I get it now. We can take a primitive root say x and consider degree n field extension \mathbb{F}_p[\delta] where \delta^n=x]
Here’s my solution in PYPY: https://www.codechef.com/viewsolution/14760517
Now, it has been improved to https://www.codechef.com/viewsolution/14785115 by obtimizing the use of array spaces.
Also, this version of my solution https://www.codechef.com/viewsolution/14770430 can solve the general cases when a_i are not assumed to be primes, and repetition allowed. This can find the minimal polynomial mod 10**9+7 of irrational numbers such as 1+\sqrt 2 + \sqrt2 + \sqrt 3 + \sqrt{12} + \sqrt{18} + \sqrt{19}, etc.
I actually learned a lot from phben’s PYPY code: https://www.codechef.com/viewsolution/14544774, also from the official editorial: https://discuss.codechef.com/questions/106139/aprpseditorial
What I found when I wrote my solution:

Precomputing factorials and inverse factorials greatly reduce running time. (These are also in phben’s code)

Naive iteration is enough to receive 60 points.

For the full 100 points, implement fft in two places:
i. When expanding f(x+\sqrt t) = A(x) + \sqrt t B(x), and
ii. When multiplying polynomials A(x)^2  t B(x)^2.

There is a threshold of degree of polynomials that naive iteration is faster than the fft. After many tries, I think it is around 2^5.
A hard part for me was 3i. It took a long time to figure out that the coefficients in A(x) and B(x) can be obtained from a certain circular convolution of a reversed polynomial. Here’s the explanation of this part.
Let f(x) = \sum_{i=0}^n a_i x^i. We are to find A(x), B(x) in f(x+\sqrt t)=A(x)+ \sqrt t B(x). By the binomial theorem, we have
We reverse the order of summation to obtain a formula for coefficient of x^j:
Let
Using that \binom i j = \frac{i!}{(ij)! j!}, we have
This is the circular convolution coefficient formula from multiplying the reversed polynomial
and