PROBLEM LINK:
Author: Misha Chorniy
Tester: Kevin Atienza
Editorialists: Suhash Venkatesh and Pushkar Mishra
DIFFICULTY:
Hard
PREREQUISITES:
Fast Fourier Transform (FFT)
PROBLEM:
Given N distinct points in 3 dimensional space and four integers A, B, C and D, you are required to compute the following sum:
Also, you are given Q different values (A, B, C, D) and are required to compute the above sum Q times.
EXPLANATION:
Subtask 1
There are 2 subtasks for this problem. For the first subtask, it is given that Q\times N^2 ≤ 77777777. This hints at using an \mathcal{O}(N^2) solution per query. It is easy to see that the required sum contains exactly N\times (N-1) terms, and we can just simulate the calculation. Here is a code snippet for the same.
Pseudo Code:
func naive(*X, A, B, C, D) { ans = 0; for (int i = 1; i <= N; ++i) { for(int j = 1; j <= N; ++j) { if (i == j) continue; dx = X[i] - X[j]; dy = Y[i] - Y[j]; dz = Z[i] - Z[j]; num = abs(A \* dx + B \* dy + C \* dz + D); den = sqrt(dx \* dx \* dx \* dx + dy \* dy \* dy \* dy + dz \* dz \* dz \* dz); } } return ans / N / (N-1); }
Subtask 2
For the second subtask N ≤ 777777. Clearly, an \mathcal{O}(N^2) solution per query will time out. Let’s take a closer look at the other constraints. It is given that 1 ≤ X_{i}, Y_{i}, Z_{i} ≤ 77 . This implies that (X_{i} - X_{j}) can only lie in the range [-76, 76], and can only take 153 values. The same is true for (Y_{i} - Y_{j}) and (Z_{i} - Z_{j}). Hence, the triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) can take at most 153^3 \approx 3.5 \times 10^6 different values. If we can efficiently count the number of times a given triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) occurs, then the problem can be solved easily using the following code snippet.
Pseudo Code:
V = 77, ans = 0; for (dx = -(V-1) to (V-1)) { for (dy = -(V-1) to (V-1)) { for (dz = -(V-1) to (V-1)) { if (dx == 0 and dy == 0 and dz == 0) continue; num = cnt[(dx, dy, dz)] * abs(A \* dx + B \* dy + C \* dz + D); den = sqrt(dx \* dx \* dx \* dx + dy \* dy \* dy \* dy + dz \* dz \* dz \* dz); ans += num / den; } } } ans = ans / N / (N-1);
If we can pre-compute and store the values of
cnt[(dx, dy, dz)]
before hand, then the above code snippet runs in \mathcal{O}(V^3) time per query, where V = 77. Hence, the overall time complexity for Q queries is \mathcal{O}(Q \times V^3), which will run in time.
Now, the only sub-problem remaining is to count the number of times each triplet (X_{i} - X_{j}, Y_{i} - Y_{j}, Z_{i} - Z_{j}) occurs. To solve this problem, let us define a few quantities:
- Let X = max(X_{i}) + 1, Y = max(Y_{i}) + 1, Z = max(Z_{i}) + 1.
- Let M = X + 2XY + 4XYZ.
- Let L be the smallest power of 2, such that L > M. For this problem, we can note that L = 2^{21} works for all cases.
- Let T = (L-1) - M.
- Let us define 2 polynomials A and B, each of degree L-1. For every input point (X_{i}, Y_{i}, Z_{i}), we update the coefficients of the 2 polynomials as follows (Note that all coefficients are initially 0):
Let us define a polynomial C of degree (2L-1) as the product of the 2 polynomials A and B. Let us consider any coefficient of C. We can see that it includes some (X_{i} + 2XY_{i} + 4XYZ_{i}) from A and some ((L-1) - (X_{j} + 2XY_{j} + 4XYZ_{j})) from B. In other words, for some index k in C, we have:
We can now extract the values of dx, dy and dz using the following code.
Pseudo Code:
for (k = 0 to 2\*L-1) { if (C[k] == 0) continue; dx = (k - T) % (2\*X) - X; dy = (k - T) % (4\*X\*Y) / (2\*X) - Y; dz = (k - T) / (4\*X\*Y) - Z; if (dx == 0 and dy == 0 and dz == 0) continue; cnt[(dx, dy, dz)] = C[k]; }
The above code runs in \mathcal{O}(L). Now, the only thing remaining is the multiplication of the two polynomials A and B to get C. This is a well known problem and can be solved by using Fast Fourier Transform (FFT). You can read more about the method here. The complexity for this step is \mathcal{O}(L \times \log{L}). From the above notations, L \approx V^3 where V = max(X_{i}, Y_{i}, Z_{i}). We can write the run time complexity of this step as \mathcal{O}(V^3 + V^3 \times \log{V^3}). Hence, the overall time complexity of this problem is \mathcal{O}(Q \times V^3 + V^3 + V^3 \times \log{V^3}), where V = max(X_{i}, Y_{i}, Z_{i}).
Further Optimization
We have already designed a solution which should theoretically pass all the test cases within the given time limit. But the constants are very high, and the time limit is set such that solutions without further optimizations will fail.
For multiplication of 2 polynomials, normally we would need to use FFT 3 times:
- We perform a FFT to calculate the DFT of the polynomial A(x).
- We perform another FFT to calculate the DFT of the polynomial B(x).
- Finally, we need to perform an inverse FFT to calculate the product C(x).
We have already seen above that the polynomial C is of degree (2L-1). Also, from the definition of A and B, it is trivial to note that A_{i} = B_{(L-1)-i}. Let us take a closer look at B.
Let T[i] denote the value of A(x) evaluated at x = \omega_{2L}^{i}. Here, \omega_{N} denotes the N^{\text{th}} complex root of unity. The first FFT calculates the values of T[i], for (0 ≤ i ≤ 2L-1). Now, we can use the fact that B_{x} = x^{L-1} A(1/x).
For any \omega_{2L}^{i}, we have B(\omega_{2L}^{i}) = {\omega_{2L}^{(L-1)i}} A(\omega_{2L}^{-i}) which can be written as {\omega_{2L}^{((L-1)\times i) \bmod 2L}} A(\omega_{2L}^{2L-i}), since \omega_{2L}^{2L} = 1.
Therefore, we have B(\omega_{2L}^{i}) = {\omega_{2L}^{((L-1)\times i) \bmod 2L}} T[2L-i]. Hence, the DFT of B can be computed in \mathcal{O}(L) and there is no need to perform the second FFT. The final FFT to calculate C is however needed. Hence, we can compute the result with only 2 FFTs instead of 3. The tester’s solution uses this method to compute the solution. You can take a look at it for more details.
Alternative Solution
There is another method to reduce the number of FFTs from 3 to 2. Let us define 2 new polynomials as follows:
- P(x) = A(x) + i B(x)
- Q(x) = A(x) - i B(x)
Here, i denotes \sqrt{-1}. Let F_{p}[k] denote the value of P(x) evaluated at x = \omega_{2L}^{k} and let F_{q}[k] denote the value of Q(x) evaluated at x = \omega_{2L}^{k}. Again, \omega_{N} denotes the N^{\text{th}} complex root of unity. Let’s try to expand both these quantities.
Hence, we make one FFT call to compute the DFT of P. We have seen that the DFT of Q can be computed in \mathcal{O}(L). Using this information, it is trivial to calculate the DFT of A and B.
- FFT(A[k]) = (F_{p}[k] + F_{q}[k]) / 2
- FFT(B[k]) = i (F_{q}[k] - F_{p}[k]) / 2 (Why?)
A second FFT to calculate C is again needed. Hence, we can compute the result with only 2 FFTs instead of 3 with this method too. The beauty of this method is that it is generic and can be used for the multiplication of any 2 polynomials. The setter’s solution uses this method to compute the solution. You can take a look at it for more details.
P.S. Some submissions using 3 FFTs managed to pass, in spite of the stringent time limit.
COMPLEXITY:
\mathcal{O}(Q \times V^3 + V^3 \times \log{V^3}), where V = max(X_{i}, Y_{i}, Z_{i}).