REALSET - Editorial

shangjingbo · December 16, 2013, 6:56pm

PROBLEM LINK:

Practice
Contest

Author: Iaroslav Tverdokhlib
Tester: Mahbub
Editorialist: Jingbo Shang

DIFFICULTY:

Hard

PREREQUISITES:

Circulant Matrix

PROBLEM:

Given a Circulant Matrix, determine whether it is full rank (i.e. non-zero determinant, or the null space is trivial).

EXPLANATION:

First of all, we need to know the matrix stated in the problem is a very special type of matrix called Circulant Matrix. There are a lot of properties about the determinant and rank of Circulant Matrix can be found in Wikipedia.

Let’s summarize the properties looks relevant to this problem:

alt text

where, f(x) = A₀ + A₁ * x + A₂ * x^2 + … A_n * x^n.

To avoid the precision problems, we can choose some prime and one of its primitive root instead of the root of unity (it may have chances (really rare, but exists) to fail, but we can choose some different primes to check, e.g. 3 different primes). Also, it will better to try Number Theoretic Transform instead of double-valued FFT in the following steps.

One straight-forward way is to calculate the determinant directly (of course, using FFT or NTT, both setter and tester solved the problem in this way). But a lot of contestants failed due to the precision problem when checking whether a double value is non-zero.

Another way is the check whether “d = 0” i.e. gcd(x^n - 1, f(x)) is some constant. Here we take the later one to solve this problem. Another related knowledge is about Cyclotomic Polynomial.

With this property, we know that x^n-1 can be factorized into some cyclotomic polynomials Phi_d(x) for every divisor d of n. Actually, for all n between 1 and 30000, the number of divisors, denoted as D, is at most 96 (when n = 27720). Therefore, we can enumerate all possible divisors d and check the cyclotomic polynomial Phi_d(x) can divide f(x).

Then, the only thing we need to do is the polynomial remainder operation. If we adopt brute-force to do this, it will take O((N-d) * d) time. Directly applying this method (with some breaks) may get TLE. But there is a interesting view to combine this method with the double-valued FFT (like this method can work well when N is a prime). This solution accepted with a further check using this method.

The fastest and easiest solution (explanation provided by Anton Lunyov)

Let’s understand how the fastest and shortest solution to this problem works.
As pointed above we need to check that our polynomial is divisible by Phi_d(x) for some divisor d of n. Instead of direct calculation of Phi_d(x) we can do the following trick. Let d has prime divisors p1, …, pk. Now the crucial observation is the following:

Some polynomial P(x) is divisible by Phi_d(x) if and only if P(x) * (x^d/p1 - 1) * … * (x^d/pk - 1) is divisible by x^d - 1.

It follows from explicit formula for cyclotomic polynomial (see second formula there). Indeed the rational function
(x^d/p1 - 1) * … * (x^d/pk - 1) / x^d - 1
equals to Q_d(x) / Phi_d(x) for some polynomial Q_d(x) that is coprime to Phi_d(x) which yields the result.

Now all is simple. Polynomial computations modulo x^d - 1 are straightforward: if P(x) = a₀ + a₁ * x + … + a_m * x^m then P(x) mod (x^d - 1) = (a₀ + a_d + a_2d + … ) + (a₁ + a_d+1 + a_2d+1 + … ) * x + … + (a_d-1 + a_2d-1 + … ) * x^d-1 (for clarity see lines 49-50 here). Multiplication by x^k - 1 could be made by simple loop and requires only additions and subtractions. And it could be done modulo x^d - 1 at the same time (see lines 53-62 here).

So we could easily find the remainder of polynomial division P(x) * (x^d/p1 - 1) * … * (x^d/pk - 1) by x^d - 1. If it’s zero for some d then the answer is YES, otherwise NO.

The complexity of each step is O(n + d * nu(d)), where nu(d) is the number of prime divisors of d. Here we have n because we need first to adjust P(x) modulo x^d - 1. Thus, overall complexity is (n * tau(n) + n * log n), where tau(n) is the number of divisors of n. Here n * log n is some estimate of the sum of d * nu(d) over divisors o n (I believe it is so :)). Hence we get super fast and simple solution that requires only additions and subtractions of polynomial coefficients.

Any different ideas and solutions are welcome.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.

jaskaran_1 · December 16, 2013, 7:33pm

I did the same thing.Find the fft of the n real inputs
ie c0,c1,c2,…,c(n-1) as mentioned in the circulant determinant link.
.Now check for each output

bool flag=0;
for(int i=0;i<N;i++) 
 {if(floor(abs(out[i]))==0)
  {printf("YES\n");
   flag=1;
   break;}
  }
if(!flag)
 printf("NO\n");

Here’s my code

http://www.codechef.com/viewsolution/3090877

Is my fft implementation incorrect?(Radix2 Cooley Tukey)

shangjingbo · December 16, 2013, 8:36pm

Maybe it is sucked by the precision (exceeding or some losses). The general idea looks fine. You can have a look at accepted solutions and find the differences.

jaskaran_1 · December 16, 2013, 8:45pm

@shanjingbo
Could you give me the failing test case and provide a detailed FFT implementation?
How to implement it to maximize precision.What about using Number Theoretic Transform? Read it somewhere that there won’t be precision error there since its modulo prime.
Could you provide a C++ implementation for that.

shangjingbo · December 16, 2013, 9:03pm

The key point is to use discrete FFT, and using Chinese Remainder Theorem (CRT). I have no codes now. And I think providing cases is not a good idea. Sorry.

mugurelionut · December 16, 2013, 9:19pm

I initially implemented FFT on real numbers (double, then long double), but I always had precision problems (actually, to be honest, since I was too lazy to implement an FFT algorithm working for any value of N, e.g. N being prime, I used a freely available implementation of Bluestein’s FFT algorithm). But I had to implement something very different in order to have my solution accepted. If you think of the values A(i) as coefficients of a polynomial (A(0) + A(1) * x + A(2) * x^2 + …), then the answer is YES only if this polynomial is divisible by a cyclotomic polynomial Phi_D(X) for some divisor D of N. My implementation of this idea is actually straight-forward: I computed all the cyclotomic polynomials for divisors of N and I checked divisibility with each of them. The important thing here is that cyclotomic polynomials have integer coefficients, so there is no need for using real numbers (thus, any possible precision errors are avoided).

xellos0 · December 16, 2013, 9:24pm

I also implemented Bluestein’s algorithm, also using long double and also had trouble with precision (running time 19 seconds, WA). This problem was probably aimed at failing any common FFT.

mugurelionut · December 16, 2013, 9:26pm

@jaskaran_1: I looked at your implementation and your algorithm is not correct. You can only use the Radix2 Cooley Tukey FFT algorithm when N is a power of 2. When N is odd you split the array into two slices of unequal sizes. You probably make them equal by adding an extra zero coefficient in the smaller slice, but that is incorrect and the end result is not the FFT for the initial array of N values. There are only a few algorithms which work for any value of N (N being prime is usually a tough case): Bluestein’s algorithm is one of them.

jaskaran_1 · December 16, 2013, 9:28pm

if the polynomial is divisible by a cyclotomic polynomial Phi_D(X) for some divisor D of N.Could you explain that?

mugurelionut · December 16, 2013, 9:55pm

@jaskaran_1: See Wikipedia for the definition and properties of cyclotomic polynomials: http://en.wikipedia.org/wiki/Cyclotomic_polynomial

cyberax · December 16, 2013, 10:56pm

I don’t mean to be rude, but this is not what can be expected for an editorial for a HARD problem. There are many concepts to be explained, many difficulties around (how you come to fast discrete fourier transform, how you use the Bluestein to handle cases where N is not a power of 2, how you handle floating point numbers precision, how you handle overflow in recursion, and so on), and you only provide a wikipedia link… I could have done the same. Telling that we can read other contestants code is not a solution, it’s a lazy way of not answering the question. I’m kind of disappointed.

lg5293 · December 16, 2013, 11:10pm

This was a frustrating problem. I think the main issue was I had a lot of precision issues, but it was frustrating because I wasn’t sure how to modify it to not use doubles (and some accepted solutions even used doubles). During the contest, I thought the fact that the solutions had to be integral made my algorithm wrong, so I wasn’t able to come up with an easy fix, which lead to me ultimately giving up on this problem. It was definitely a cool problem.

I just have some quick followup questions though. Does the fact that B has to be integral matter? How difficult is it to actually reconstruct a sequence B that works (if one exists)?

jaskaran_1 · December 16, 2013, 11:31pm

@mugurelionut Won’t it work like mergesort.I mean there also the array isn’t exactly halved.Why isn’t it correct?
Could you give me an example where it fails?

jaskaran_1 · December 16, 2013, 11:33pm

@mugurelionut how about using Number Theoretic Transform?

xellos0 · December 16, 2013, 11:46pm

Nope. If there’s a solution in complex numbers, there’s one in integers as well. Proof (a bit short, there’s not much space in comments :D):

a solution in complex numbers is a pair of solutions in reals that means we can take just the real parts of b-s
if there is a solution (in reals now), we can obtain it by running GEM on the above mentioned circulant matrix, but GEM can be performed in rationals only, which gives us a solution in rational numbers
if vector B is a solution, kB is also a solution; choose k to be the product of all denominators, and kB is a solution in integers

xellos0 · December 16, 2013, 11:48pm

I think the editorial is in the process of being written, so you shouldn’t flame just because you’re impatient…

I mean, I hope it is so…

lg5293 · December 17, 2013, 12:08am

Cool thanks I was only able to get step 1, but wasn’t able to convince myself of step 2 during the contest. It’s much clearer now.

xellos0 · December 17, 2013, 12:54am

jaskaran: If you actually looked more into how/why it works this way, you’d notice that Cooley-Tukey makes use of identities involving the N-th root of unity, and those are only nice if we split the FFT’d array into arrays of equal sizes.

You’re basically padding the array to next larger power of 2, which changes its roots of unity. The most trivial example: {1,1,0} has no solution, but your answer is YES.

Is it really that hard to check small cases by hand when you have days for this?

kevinsogo · December 17, 2013, 1:39am

I’d like to share my journey with this problem (apologies but it’s very long).

During day 1, I started by trying to solve some easy ones first, and eventually found this one with no accepted solutions yet. Since the contest had just begun, I tried not to think too much: the examples given were [1,1,1] and [2,-4,1,1], both of which involve the vector [1,1,1…] of all ones, so I thought the answer must somehow always involve such a vector. I tried to find a few more cases (such as [1,2,3,1,2,3] being annihilated by [1,0,0,-1,0,0]), and came up with my first (naive) solution. Here it is written in pseudo-python (link to my submission here):

def is_beautiful(a[0..n-1]):
	for every divisor d of n:
		w = ((array of length d))
		for i from 0 to d - 1:
			w[i] = a[i] + a[i + d] + a[i + 2*d] + ...
		if check(w):
			return True
	return False

# this is a subroutine
def check(a[0..n-1]):
    for every divisor d of n:
    	# check if sums of subarrays of distance d are all zero
        all_zero = True
        for i from 0 to d - 1:
            s = a[i] + a[i + d] + a[i + 2*d] + ...
            if s is not 0:
            	all_zero = False
            	break
    	if all_zero:
    	    return True
    for every proper divisor d of n:
    	# check if subarrays of distance d have all equal elements
		all_eq = True
		for i from d to n - 1:
			if v[i] != v[i - d]:
				all_eq = False
				break
    	if all_eq:
    	    return True
	return False

This is an O(n log² n) algorithm. It somehow passed the initial dataset of the problem, even though it’s wrong (the first counter-example I discovered is [1,-1,1,0,0,0], which can be annihilated by [1,1,0,-1,-1,0]). Luckily the dataset was fixed immediately (and the solution then got WA).

After actually thinking about the problem, it became clear that we needed to determine the singularity of a certain special matrix, each row of which is a single right-rotation of the previous row. After a brief search, I discovered that such a matrix is actually called a circulant matrix, and the Wikipedia page conveniently provided a few ways to determine singularity.

It was clear that if the matrix A is nonsingular, then the only solution to Ax = 0 is the vector x = 0, and if A is singular, then Ax = 0 has a nonzero solution x whose elements are rational, and hence an integral solution (using a proper scaling factor). That there exists a nonzero x whose elements are rational can be confirmed by doing Gaussian elimination with total pivoting on Ax = 0 until we fail to find a pivot (this will happen because A is singular), in which case we can assign the remaining variables 1, and solve for the others. It is clear that all intermediate and final values are rational.

The first property I considered was the ‘Rank’. Wikipedia says that the rank is n - D, where D is the degree of gcd(a(x), xⁿ-1) (and a(x) = a₀ + a₁·x + a₂·x² + … + a_n-1·x^n-1). Now, we only needed to check whether D is zero or not, i.e., gcd(a(x), xⁿ-1) is a constant or not. I actually found a theoretically good one to calculate polynomial gcd, running in O(n log² n), based on the concept of “Half-gcd”. This relies on using FFT for polynomial multiplication AND division (see this link for a description on how to reduce polynomial division and modulo to multiplication). However, after playing around with it a bit and implementing it I discovered that there are far too many multiplications and divisions for this approach to pass the time limit (it started getting slow even at n around 10000), so I scratched this approach.

Next, I factorized xⁿ-1 into prime polynomials, which are simply the cyclotomic polynomials Φ_n(x) (Φ_n(x) is the monic polynomial whose roots are the primitive nth roots of unity). xⁿ-1 is simply the product of Φ_d(x) for every divisor d of n. Since the cyclotomic polynomials are prime, D is nonzero if and only if one of those Φ_d(x) divides a(x).

This means I only had to do polynomial modulo operation for every Φ_d(x). Among all n in [1,30000], 27720 has the largest number of divisors, at 96. This means that this solution runs in about 96·O(n log n) time. I was very excited with this solution because I already knew how to code all the parts, unlike the HGCD approach, so I spent some time coding it, but after submitting, the verdict was TLE. Clearly the constant is still too high, and this approach only works well if divisor_count(n) isn’t too large.

By the way, whenever I implement FFT, I always do it modulo a large prime p which has a primitive 2^kth root of unity for a large enough k (say p=2013265921 where 440564289 is a primitive 2²⁷th root of unity). This allows me to use 440564289 as my root of unity instead of a complex irrational, and thus using integers entirely and avoiding precision errors.

Of course, this means that the coefficients will be reduced mod p. However, for this problem, I simply assume that if A(x) divides B(x) for some polynomials A and B in Z_p[x], then A(x) also divides B(x) in Z[x]. This has a tiny chance of failing which I assumed would not happen on the judge’s data.

So I left the ‘Rank’ approach and considered the ‘determinant’ next. We needed to know if the determinant is zero or not. According to the Wikipedia page again, the determinant of a circulant matrix is the product of the evaluation of a(x) for all nth roots of unity. In other words, we needed to find the DFT of the sequence a₀, a₁, …, a_n-1, and check whether any of the values of the resulting transform is zero. The problem is that the normal FFT algorithm only works for power-of-2 sizes, so we have to find a way to calculate the DFT of an exact size.

An issue that came up is that we needed to have a primitive nth root of unity in our modulus, so p=2013265921 wouldn’t work most of the time. My solution is to simply write a list of large primes such that for every n in the range [1…30000], there is a prime p in the list where p ≡ 1 (mod n). To meet the 50000 bytes program limit I had to compress the list and decompress it at the start of execution.

The first thing I thought of is that the normal FFT algorithm can be extended. In FFT, we split into two parts of size n/2 and then combine in O(n) time, but it is possible to split into p parts, each of size n/p, for any prime p dividing n, and combine in time O(p·n). If we continue to do this for all prime factors of n (considering multiplicity), we will eventually be able to compute the DFT!

The only problem is that p can be large. If n is composite, then we can always find a p ≤ √n ≤ 173, and O(p·n) will be waitable. But if n is prime, then O(p·n) = O(n²) which is clearly too slow. So I had to think more of how to solve that part, but for now this approach works well if n doesn’t have large enough prime factors (I tried implementing this first and submitting, and the judge indeed gave the expected TLE).

At this point I took a dinner break, and while walking to the store I thought of a cool idea. Why not combine my two previous approaches? I figured that if n has a large prime factor, it cannot have that many divisors, so my first algorithm would work well! I became very excited with this idea and I tried implementing it immediately (after eating), having only needed to do a few copy-pasta. Sure enough, the solution didn’t get TLE! Unfortunately, it got WA.

Of course it turned out later that it only got WA because there were some issues with the judge data. After rejudging, this solution got accepted (link here). However I didn’t know that it was correct at the time, so I suspected that working modulo primes was bad after all.

At this point I felt that solving DFT problem for prime sizes was the best approach, so after a bit of research I found two algorithms, Rader’s algorithm and Bluestein’s algorithm. Both solutions are amazingly simple and clever, and work in O(n log n). Bluestein’s algorithm has a bonus that it works for any size, not just prime sizes, but for some random reason I decided to use Rader’s algorithm, perhaps because I already have code for finding a generator for a given prime p.

There were a few details to consider, but I was able to finally implement it (link here). However the judge still gave WA (because of judge data issues). At this point I concluded that there must be something wrong with the data, or the data is too clever that it somehow finds a bad input to my essentially random list of primes (out of desperation I even tried to submit a double-precision version of FFT, which of course got WA)! Since I felt the former to be more likely, I simply gave up and waited until perhaps things got resolved. Sure enough, I later found @mugurelionut being able to solve it, followed by @djdolls, with no WAs in between! Since I found this rather unlikely, I tried to submit my latest solution and sure enough it got AC. I felt a lot of relief afterwards, knowing that I can finally move on with my life!

It was indeed very frustrating to get many WAs due to the issues with the judge data, but I think there has been some positive side effect on me because I was able to learn a lot of new things. So thank you @KADR for proposing the problem, and I hope you handle the input/output files correctly next time!

jaskaran_1 · December 17, 2013, 1:57am

@xelios I had given the incorrect link. I’ve edited it.Could you check it now?It works for 1 1 0.