Links
Problem
Given 2 \leq n \leq 10^{500} and the value of the Euler totient function \varphi(n) print the prime factorization of n.
Small solution
In the smallest case you can easily succeed by just factoring N. The number is well within the range of 64-bit integers and something like Pollard’s rho algorithm should work fine.
Large solution
For large cases you’ll be facing two main issues. How do I even handle these numbers? How does knowing the totient help in the factorization of N?
A simple answer for the first question is to use a language with built in bigints such as python, or painstakingly implement a decently high performance bigint library yourself. I would recommend the former.
The second question comes down to a neat number theory theorem and some cleverness. Euler’s theorem states that if n and a are coprime then
Why is this useful? Let’s say we pick a random a. If a^{\varphi(n)} \not\equiv 1 we have found a factor of n, but that’s very unlikely. What if the equivalence holds? Can we do something clever?
Cleverness
The end goal here is to find p,q \not\in \{1,n\} such that p q \equiv 0 \pmod{n}, for which \gcd(p,n) and \gcd(q,n) are non-trivial factors of n. Specifically we’ll be looking for x such that x^2 \equiv 1 \pmod{n} since that can be rewritten (x-1)(x+1) \equiv 0 \pmod{n}.
Rewrite \phi(n) = 2^K t such that 2 does not divide t. We know that the equivalence holds for a^{2^Kt}, but consider the smallest k \leq K so that it holds. For n>2 we have that \varphi(n) is even, so k \geq 1, and hence (letting q=2^{k-1}t) we can write this as
We’re very close. We know that for a^q the equivalence didn’t hold (since we picked the smallest k that worked) so the first factor a^q - 1 is not divisible by n. We can’t guarantee that the second factor is not divisible by n, but it’s easily checked and is quite likely. If all checks out \gcd(a^q \pm 1,n) will be non-trivial factors of n.
So extracting factors of n boils down to trying new bases a until we find a factor or decide that we probably can’t reduce things further. Fortunately, we’re quite likely to end up in the good case.
Some gotchas
The above does not work for even n or perfect powers. This can easily be solved by extracting all twos at the beginning and to check if we have a perfect power before trying to find an additional factor (this can be done by iterating over possible bases and binary searching for the exponent).
Summary
Extract any twos from n immediately.
def factor(n):
if is_power(n):
# number is a^k
return k lots of factor(a)
for some number of iterations:
pick a base and try finding a factor as explained earlier
if found p,q such that p*q == n:
return factor(p) + factor(q)
else:
# Probably prime
return {}
factor(n)
Time complexity
The time complexity is not at all trivial to derive, and I wont even try to do it properly. I will however give some insights as to where costs arise and why the limits are as they are.
(Please inform me if I’ve done something horrendous here and I’ll try to amend it, I threw this part together late at night.)
The computational cost for my implementation can be separated into a few parts.
factor
In the rough pseudo-code the factor method will be called at most O(\log n) times (proportional to the number of prime factors). A call factor(k)
has two parts.
power
Finding the power (if one exists) was done using \log(k) binary searches which computes powers inside. I kept the cost of the binary search down by doing a decent initial estimate, but even ignoring that cost we’re still dealing with something like \log(\cdot)^3 (one for the number of binary searches, one for exponentiation by squaring, one for multiplication of numbers at this scale).
This is where the limit in the problem statement matters, costs involving small polynomials in \log(\cdot) must be feasible to compute.
looking for a factor
Looking for a factor in itself is not that expensive. It’s roughly O((\text{#factors two in \phi(n)}) \times \log(\cdot)^2). Again the square is due to calculating powers and dealing with large numbers.
This cost will however be done repeatedly until either a result is gotten or a user defined iteration limit L is hit. If this limit is too low you risk getting wrong answers because you didn’t manage to decompose a number that would have decomposed into factors. If this limit is set too high you’ll TLE because you’re checking too much.
The actual cost of this whole process depends on how likely you are to find a good base, the number of factors in the number and probably more costs I’ve forgotten about.
Assuming that we don’t have to do many iterations before finding a factor, the complexity is something like O(L\times(\text{#factors two in \phi(n)}) \times \log(\cdot)^2).
Overall
We’re probably looking at something like O(T L \log(\cdot)^3) or O(T L \log(\cdot)^4). The log part is a mix of different logs, some of which are considerably smaller than n. As said, a detailed analysis is rather hard.