SFUNC - Editorial

vaka · December 29, 2013, 3:38pm

PROBLEM LINKS

Contest

Practice

Author: Constantine Sokol

Tester: Roman Rubanenko

Editorialist: Praveen Reddy Vaka

Note: All computations in this editorial are modulo M unless explicitly specified.

DIFFICULTY:

easy-medium

PREREQUISITES:

Basic Number Theory, Modular Arithmetic, state transitions using matrices, Fast Modular Exponentiation, inclusion-exclusion principle, Modular Multiplication using just addition (for subtask 4)

HINT

Consider the (K+2) length column vectors

V = [1, n, n², …, n^K, 1+2^K+3^K+…+n^K] ^T

V’ = [1,(n+1),(n+1)²,…,(n+1)^K, 1+2^K+3^K+…+(n+1)^K]^T

T stands for transpose

Can you find a matrix M of dimensions (K+2)x(K+2) such that V’ = M*V

Using this matrix M we can find 1+2^K+3^K+…+N^K in O(K³ * log N) time.

But this is not what we want we just want the summation of Kth powers of numbers which are co prime to N.

Using prime factorization of N and by using inclusion exclusion principle along with the matrix multiplication we established earlier we can find the desired quantity.

EXPLANATION

Subtask 1 and 2:
A simple algorithm will pass these two sub tasks.

set sum = 0

for i = 1 to N

if gcd(i,N) == 1 set sum = (sum + (i^K)%M)%M

output sum

For subtask2 if you store all your values in 32-bit integers you are likely to run into overflow issues and hence will output wrong values. To be on safe side store all your values in 64-bit integers.

Subtask 3 (solves subtask1 and subtask2 also)

Let us try to figure what is the matrix M described in the hint.
Let the rth row of matrix M be R. The the rth entry of V’ is just the dot product of R and V, i.e.,
V’[r] = R[0]*V[0] + R[1]*V[1] + … + R[K+1]*V[K+1]

for r < K+1 this is simply

(n+1)^r = R[0]*1+R[1]*n+R[2]*n^r+…+R[r]*n^r +…+R[K]*n^K + R[K+1]*(1+2^K+…+n^K)

From the binomial theorem the left hand side is

(1+n)^r = C(r,0)*1 + C(r,1)*n + C(r,2)*n² + … + C(r,r)*n^r

So by simply comparing the coefficients we have R[i] = C(r,i) for i<=r and R[i] = 0 for i > r

for r = K+1 we have

1+2^K+…+n^K+(n+1)^K = R[0]*1+R[1]*n+R[2]*n²+…+R[r]*n^r +…+R[K]*n^K + R[K+1]*(1+2^K+…+n^K)

Setting R[K+1] = 1 we have (n+1)^K= R[0]*1+R[1]*n+R[2]*n²+…+R[r]*n^r +…+R[K]*n^K
which is pretty much same as the sum for the previous row. Hence we can take all these coefficients from the previous row. So we have completed our matrix M now.

Let us have a look at our matrix for K=4

1 0 0 0 0 0
1 1 0 0 0 0
1 2 1 0 0 0
1 3 3 1 0 0
1 4 6 4 1 0
1 4 6 4 1 1

Note that the first K+1 rows are just the pascal triangle. This matrix can be computed in O(K²) time.

Our initial vector V will be [1, 0, 0, …, 0]^T

If we want 1^K+2^K+…+N^K all we need is to find the last entry of the vector V’ = (M^N)*V

Since V is all 0’s except for a 1 at the first position the value is simply the first entry in the last row of (M^N)

You can compute (M^N) in O(K³ * log N) using fast exponentiation. If you are not aware of these techniques have a look at http://en.wikipedia.org/wiki/Exponentiation_by_squaring and http://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method.

This is job almost half done.

let us define duperFunction(N, K) = sum (i^K) over all i <= N and gcd(i,N) != 1

We have superFunction(N, K) = (1^K + 2^K + …… + N^K) - duperFunction(N,K)

Now let us look what duperFunction(N,K) looks like
if N is a prime then duperFunction(N,K) = N^K as every other number < N will have gcd of 1 with N.

if N is a product of two primes p1, p2 then all numbers which are divisible by p1 or p2 or both will have gcd greater than 1 with N.

All the numbers which are <= N and divisible by p1 are p1, 2*p1, 3*p1,…floor(N/p1)*p1
All the numbers which are <= N and divisible by p2 are p2, 2*p2, 3*p2,…floor(N/p2)*p2

But if we raise all these to the Kth power and add them up this will not give us the desired value of duperFunction because few numbers like p1*p2 will be present in both the lists. We can use inclusion exclusion principle here to find our duperFunction.

duperFunction(N,K) = f(p1) + f(p2) - f(p1*p2)

where f(X) = X^K + (2X)^K + … + (floor(N/X)X)^K
= X^K(1 + 2^K + … + (floor(N/X))^K)
We can compute X^K using fast exponentiation and we already know how to compute the second part of the product. Hence we can compute the duperFunction and hence the superFunction.

Since the inclusion exclusion principle extends to sets of arbitrary size we can simply compute the set of all prime factors of N and use inclusion exclusion principle on this set in a similar fashion we did for two primes.
Check http://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle if you have not too familiar with inclusion exclusion principle
if the set of primes is p1,p2,p3,…,ps then

duperFunction (N, K) = f(p1) + f(p2) + …+f(ps) (single terms)

- f(p1*p2) - f(p1*p3) - …. (pair wise products of primes)

f(p1*p2*p3) …… (triplet wise products)

.

+(-1)^(s+1) f(p1*p2*…*ps)

You could compute above summation by simply iterating over all subsets of { p1,p2,p3,…,ps}. Since N is atmax 10¹² we will not have more than 11 prime factors and hence you will need to evaluate the function f for max 2¹¹ values.

You can compute the prime factors of N in O(sqrt(N)) time.

So the time complexity of the solution will be O(sqrt(N) + 2^s * K³ * log N)

Since the value of M is 10⁹ at max if are careful enough using 64-bit integer values you will never overflow.

Subtask4: (solves all the subtasks)

M is at max 10¹⁸. So even if you use 64-bit integers while multiplying you will definitely overflow. So you have to perform modular multiplication using only addition as 2x10¹⁸ still falls in 64-bit integer range. Look at the solutions to see how you implement such a multiplication. You need to use this or a similar function whenever you multiply two numbers for this subtask.

Though the constraint on K is little bit higher it looks like the previous solution solves this subtask as well. But it will most likely not for the following reason. Since you no longer use primitive multiplication the overhead of our custom multiplication is pretty high as we perform it as many as O(2^s * K³ * log N) times. So we will optimize the solution a bit further

When we compute M we also compute iteratively M², M⁴, M⁸, …., M^(2⁴⁰) and keep them stored. Since 2⁴⁰ > 10¹² we need powers of two only till here.

Recall our vector matrix definition once again. V’ = (M^N) * V and V’[K+1] gives the value for (1^K+…+N^K);

If you can decompose M^N into product of the form M^(2ⁱ) (which we have precomputed) then due to associativity of matrix multiplication we can just multiply the vector with one of the matrices and get a new vector and keep multiplying this vector with the rest of the matrices to get a new vector each time to ultimately get our V’.
A small example if you want to compute (M¹³)*V you can compute it in the following way
(M¹³)*V = (M⁸) * (M⁴) * (M¹)*V = M⁸ * (M⁴ * (M*V))
Since there can logN such decompositions and matrix vector multiplication can be done in O(K²), we can perform this task in O(2^s * K² * log N).

The overall time complexity becomes O(K³ * log N + sqrt(N) + 2¹¹ * K² * log N). The first K³ * logN factor is for pre-computing.

Note: You could actually compute (1^K + 2^K + 3^K + … + N^K) using the formulas given at http://mathworld.wolfram.com/PowerSum.html

But since our M is not a prime number and the formula involves division it will be pretty tedious to compute these formulas modulo M. But for K<=3 you can easily use these formulas. So you can solve subtask3 without actually using our Matrix Multiplication method.

SOLUTIONS

Setter’s Solution: SFUNC.cpp

Tester’s Solution: SFUNC.cpp

Editorialist’s Solution: SFUNC.java

Note: Editorialist’s Solution follows the exact flow of editorial.
Setter’s and Tester’s solutions are also based on the same concepts so if you understand the editorial it should not be tough to read through their solutions

qwerty1 · December 29, 2013, 5:10pm

awesome editorial!!!