I was astonished after seeing this problem as this problem basically asks to calculate the convolution of the given polynomial with a constrained K. I know about fast fourier transform to multiply two polynomial in O(nlogn) but there we used the idea of nth roots of unity in place of some random K, to reduce the input size by half in each level of recurrence. But, if we are constrained to choose a paricular K, how to sample N points in better than O(N^2)!!! Also what does it means to “design” a DFT which works on some particular Modulo?

A better solution with O(n (log n)**2) is given here: https://www.student.cs.uwaterloo.ca/~cs487/handouts/script07.pdf

I tried implementing this, however it was tle probably due to higher constants in multiplication in NTT/FFT. It was taking around 12 seconds for the worst case -_-

Same approach is described in Cormen Book too. But it seems hard to code.

Look at all possible remainders of x^2 modulo the given prime. How many different remainders exist?

Do the same with x^4, x^8 etc. How many remainders exist?

My approach uses this and it’s almost brute force.

Please look here http://e-maxx.ru/algo/fft_multiply if you don’t speak Russian (like me) use Google Translate.

I have discussed my approach in this page :

https://discuss.codechef.com/questions/82993/workchef-and-polyeval-problems-in-july-16?page=1#83020

I have shared my approach on this page:

https://discuss.codechef.com/questions/82993/workchef-and-polyeval-problems-in-july-16?page=1#83020

I did not use FFT / NTT in this problem. I used similar insight from @xellos0 's insight. I recursively decomposed a polynomial into four polynomials in terms of x^4 and used unordered_map to save the result for each decocmposition. I continued decomposition until I only have one term (constant term). The number of possible remainders when the function x^4 mod 786433 is repeatedly applied is reduced drastically every application which gives an opportunity for memoization / DP. I don’t know how to prove this mathematically, but I tried creating a program which counts the number of possible remainder and indeed this is true for powers of two. Notice that the decomposition would produce a tree with four childs, and so I used 4-ary heap-like indexing. https://www.codechef.com/viewsolution/10762921

You can decompose a polynomial into four polynomials in this way:

A(x) = A_{4k}(x^4) + x A_{4k+1}(x^4) + x^2 A_{4k+2}(x^4) + x^3 A_{4k + 3}(x^4)

IN the formula above, A_{4k}(x^4) are the coefficients that are multiples of 4, A_{4k+1}(x^4) are the coeffiients that are multiples of 4 but +1.

We can also decompose a polynomial into 8 polynomials in terms of x^8. I think I would have gotten faster running time if I used higher power such as 8 because the number of remainders reduce even faster.

I hope the logic is sufficiently understandable, let me know if there is something unclear with my explanation.

No prime, every x up to the modulo.