SFUNC - Editorial




Author: Constantine Sokol

Tester: Roman Rubanenko

Editorialist: Praveen Reddy Vaka

Note: All computations in this editorial are modulo M unless explicitly specified.




Basic Number Theory, Modular Arithmetic, state transitions using matrices, Fast Modular Exponentiation, inclusion-exclusion principle, Modular Multiplication using just addition (for subtask 4)


Consider the (K+2) length column vectors

V = [1, n, n2, …, nK, 1+2K+3K+…+nK] T

V’ = [1,(n+1),(n+1)2,…,(n+1)K, 1+2K+3K+…+(n+1)K]T

T stands for transpose

Can you find a matrix M of dimensions (K+2)x(K+2) such that V’ = M*V

Using this matrix M we can find 1+2K+3K+…+NK in O(K3 * log N) time.

But this is not what we want we just want the summation of Kth powers of numbers which are co prime to N.

Using prime factorization of N and by using inclusion exclusion principle along with the matrix multiplication we established earlier we can find the desired quantity.


Subtask 1 and 2:
A simple algorithm will pass these two sub tasks.

set sum = 0

for i = 1 to N

  • if gcd(i,N) == 1 set sum = (sum + (iK)%M)%M

output sum

For subtask2 if you store all your values in 32-bit integers you are likely to run into overflow issues and hence will output wrong values. To be on safe side store all your values in 64-bit integers.

Subtask 3 (solves subtask1 and subtask2 also)

Let us try to figure what is the matrix M described in the hint.
Let the rth row of matrix M be R. The the rth entry of V’ is just the dot product of R and V, i.e.,
V’[r] = R[0]*V[0] + R[1]*V[1] + … + R[K+1]*V[K+1]

for r < K+1 this is simply

(n+1)r = R[0]*1+R[1]*n+R[2]*nr+…+R[r]*nr +…+R[K]*nK + R[K+1]*(1+2K+…+nK)

From the binomial theorem the left hand side is

(1+n)r = C(r,0)*1 + C(r,1)*n + C(r,2)*n2 + … + C(r,r)*nr

So by simply comparing the coefficients we have R[i] = C(r,i) for i<=r and R[i] = 0 for i > r

for r = K+1 we have

1+2K+…+nK+(n+1)K = R[0]*1+R[1]*n+R[2]*n2+…+R[r]*nr +…+R[K]*nK + R[K+1]*(1+2K+…+nK)

Setting R[K+1] = 1 we have (n+1)K= R[0]*1+R[1]*n+R[2]*n2+…+R[r]*nr +…+R[K]*nK
which is pretty much same as the sum for the previous row. Hence we can take all these coefficients from the previous row. So we have completed our matrix M now.

Let us have a look at our matrix for K=4

1 0 0 0 0 0
1 1 0 0 0 0
1 2 1 0 0 0
1 3 3 1 0 0
1 4 6 4 1 0
1 4 6 4 1 1

Note that the first K+1 rows are just the pascal triangle. This matrix can be computed in O(K2) time.

Our initial vector V will be [1, 0, 0, …, 0]T

If we want 1K+2K+…+NK all we need is to find the last entry of the vector V’ = (MN)*V

Since V is all 0’s except for a 1 at the first position the value is simply the first entry in the last row of (MN)

You can compute (MN) in O(K3 * log N) using fast exponentiation. If you are not aware of these techniques have a look at http://en.wikipedia.org/wiki/Exponentiation_by_squaring and http://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method.

This is job almost half done.

let us define duperFunction(N, K) = sum (iK) over all i <= N and gcd(i,N) != 1

We have superFunction(N, K) = (1K + 2K + …… + NK) - duperFunction(N,K)

Now let us look what duperFunction(N,K) looks like
if N is a prime then duperFunction(N,K) = NK as every other number < N will have gcd of 1 with N.

if N is a product of two primes p1, p2 then all numbers which are divisible by p1 or p2 or both will have gcd greater than 1 with N.

All the numbers which are <= N and divisible by p1 are p1, 2*p1, 3*p1,…floor(N/p1)*p1
All the numbers which are <= N and divisible by p2 are p2, 2*p2, 3*p2,…floor(N/p2)*p2

But if we raise all these to the Kth power and add them up this will not give us the desired value of duperFunction because few numbers like p1*p2 will be present in both the lists. We can use inclusion exclusion principle here to find our duperFunction.

duperFunction(N,K) = f(p1) + f(p2) - f(p1*p2)

where f(X) = XK + (2X)K + … + (floor(N/X)X)K
= XK
(1 + 2K + … + (floor(N/X))K)
We can compute XK using fast exponentiation and we already know how to compute the second part of the product. Hence we can compute the duperFunction and hence the superFunction.

Since the inclusion exclusion principle extends to sets of arbitrary size we can simply compute the set of all prime factors of N and use inclusion exclusion principle on this set in a similar fashion we did for two primes.
Check http://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle if you have not too familiar with inclusion exclusion principle
if the set of primes is p1,p2,p3,…,ps then

duperFunction (N, K) = f(p1) + f(p2) + …+f(ps) (single terms)

- f(p1*p2) - f(p1*p3) - …. (pair wise products of primes)

  • f(p1*p2*p3) …… (triplet wise products)




+(-1)(s+1) f(p1*p2*…*ps)

You could compute above summation by simply iterating over all subsets of { p1,p2,p3,…,ps}. Since N is atmax 1012 we will not have more than 11 prime factors and hence you will need to evaluate the function f for max 211 values.

You can compute the prime factors of N in O(sqrt(N)) time.

So the time complexity of the solution will be O(sqrt(N) + 2s * K3 * log N)

Since the value of M is 109 at max if are careful enough using 64-bit integer values you will never overflow.

Subtask4: (solves all the subtasks)

M is at max 1018. So even if you use 64-bit integers while multiplying you will definitely overflow. So you have to perform modular multiplication using only addition as 2x1018 still falls in 64-bit integer range. Look at the solutions to see how you implement such a multiplication. You need to use this or a similar function whenever you multiply two numbers for this subtask.

Though the constraint on K is little bit higher it looks like the previous solution solves this subtask as well. But it will most likely not for the following reason. Since you no longer use primitive multiplication the overhead of our custom multiplication is pretty high as we perform it as many as O(2s * K3 * log N) times. So we will optimize the solution a bit further

When we compute M we also compute iteratively M2, M4, M8, …., M(240) and keep them stored. Since 240 > 1012 we need powers of two only till here.

Recall our vector matrix definition once again. V’ = (MN) * V and V’[K+1] gives the value for (1K+…+NK);

If you can decompose MN into product of the form M(2i) (which we have precomputed) then due to associativity of matrix multiplication we can just multiply the vector with one of the matrices and get a new vector and keep multiplying this vector with the rest of the matrices to get a new vector each time to ultimately get our V’.
A small example if you want to compute (M13)*V you can compute it in the following way
(M13)*V = (M8) * (M4) * (M1)*V = M8 * (M4 * (M*V))
Since there can logN such decompositions and matrix vector multiplication can be done in O(K2), we can perform this task in O(2s * K2 * log N).

The overall time complexity becomes O(K3 * log N + sqrt(N) + 211 * K2 * log N). The first K3 * logN factor is for pre-computing.

Note: You could actually compute (1K + 2K + 3K + … + NK) using the formulas given at http://mathworld.wolfram.com/PowerSum.html

But since our M is not a prime number and the formula involves division it will be pretty tedious to compute these formulas modulo M. But for K<=3 you can easily use these formulas. So you can solve subtask3 without actually using our Matrix Multiplication method.


Setter’s Solution: SFUNC.cpp

Tester’s Solution: SFUNC.cpp

Editorialist’s Solution: SFUNC.java

Note: Editorialist’s Solution follows the exact flow of editorial.
Setter’s and Tester’s solutions are also based on the same concepts so if you understand the editorial it should not be tough to read through their solutions


awesome editorial!!!

1 Like