FAULT - Editorial

anton_lunyov · April 15, 2013, 3:00pm

PROBLEM LINK:

Practice
Contest

Author: David Stolp
Tester: Hiroto Sekido
Editorialist: Anton Lunyov

DIFFICULTY:

CHALLENGE

PREREQUISITES:

Bitwise operations

PROBLEM:

The best described in the problem statement.

EXPLANATION:

The simplest solution is to output all servers. It will bring you only 13.249427 points, but you can even don’t read distribution of chunks by servers

The second simplest solution that still allows you to not read distribution of chunks by servers is to output first S − N + 1 servers. Thus, we left only N − 1 servers. It is clear that it is impossible to recover all N chunks having only N − 1 their combinations. Such solution brings you exactly 10 points and actually scoring function provides a hint to such solution

The third simplest solution is to iterate over all chunks, find the one that belongs to the least number of servers and output servers of this chunk. This will bring a better score 6.378516. This works because after deleting these servers, the chosen chunk is not contained at any other server, so we can’t recover it.

Now we consider the idea that allows to reach the high score.
The input can be represented as S × N system of linear equations over the field Z₂. The variables are chunks and the right hand side is composed of servers. If A is the matrix of this system, then A[i][j] = 1 if and only if chunk j is contained on server i (1 ≤ i ≤ S, 1 ≤ j ≤ N).

Claim. The chunks could be always uniquely recovered if and only if all N columns of this matrix are linearly independent over Z₂.
Indeed, if columns are linearly independent, then using facts from basic linear algebra we can left only N rows of this matrix such that obtained N × N matrix is non-degenerate. Hence we can recover chunks by servers using inverse matrix. Now assume that columns are linearly dependent and their rank is K < N. Then we can left exactly K linearly independent rows in the system. It is well-known that such system has not-unique solution for every possible vector in right-hand-side.

Thus, we need to delete as few rows as possible such that some column becomes a linear combination of other columns modulo 2. This, in turn, equivalent to performing series of elementary column operations such that some column becomes zero (in the matrix with deleted rows). Elementary column operation is simply a bitwise xor of columns if we consider them as binary numbers.

Instead of deleting some rows and then trying to reach zero column, we can do at first some elementary column operations and then for every column the set of rows we should delete to make this column zero is simply all rows having 1 in this column.

To achieve better performance we could represent columns as collection of 32bit unsigned variables and do bitwise xor 32 times faster (see bitset data structure in author’s solution).

Now the simplest we can do is to perform random bitwise xors while time permits, check whether the column we change have less ones than the answer and update the answer if necessary. To count the number of ones at GNU C++ we can use fast built-in function __builtin_popcount. Otherwise we can precalculate popcount for numbers up to 2¹⁶ and calculate popcount of 32bit number as a sum of popcount for its lower 16 bits and popcount for its higher 16 bits. Such solution could bring about 4.89 points.

Another method to use this idea is the following.
At each step we select random 1 in the matrix. Let it be in the intersection of the i-th row and the j-th column. Then we do the step like in Gaussian elimination. For each column k other than j, if the i-th element in the k-th column is 1 we replace k-th column by its bitwise xor with the j-th column. Also we check whether the updated k-th column has less ones than the answer and update the answer if necessary. We do these steps while time permits. Such solution could bring you about 3.75 points (note that the winner score is about 3.60).

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here but it get WA
The corrected version is here (the score of this submissions is 3.745888).

Tester’s solution can be found here.

RELATED PROBLEMS:

IGNITE
CUCUMBER

bugkiller · April 15, 2013, 4:02pm

The top solution by the facebook guy winger made my head spin with angular velocity as an irregular function of the number of servers and chunks present in the problem in the given domain.

akashverma_123 · April 15, 2013, 4:33pm

Author’s solution to this problem is giving WA
why??

amitbaranwal53 · April 15, 2013, 4:58pm

I performed something like last step along with bitwise but never get better than 3.941693.
can someone please tell me what can i do further in order to get ~3.75 ?
http://www.codechef.com/viewsolution/2049770

tijoforyou · April 15, 2013, 5:01pm

My approach was the one marked as “the third simplest solution” in the editorial. My final score (out of 1) is 0.511. Interestingly, I am the only one with that score!!

Surely, everyone must have moved on!!

mugurelionut · April 15, 2013, 5:35pm

To me the best thing to take away from @winger’s solution is how he split the possible test cases according to a hash computed over the input values. Then, during the contest, he tried to find the best rand seed for these hash values (i.e the rand seed for which his algorithm produced the best result). This is obviously better than using the same rand seed for all the test cases (that’s what I did) or choosing some random seed (e.g. a time-dependent seed), in which case you have no control over your solution’s performance.

argos · April 15, 2013, 5:42pm

But this approach is time consuming.

bugkiller · April 15, 2013, 6:42pm

@argos >> doesnt matter if it yields you a better score.

bugkiller · April 15, 2013, 6:43pm

@mugurelionut >> thanks for the rand seed explanation. I was not getting that.

kavishrox · April 16, 2013, 12:12pm

The test cases generated for the problem were very fair and just(better algos got better points) . I would like to see the algorithm with which these test cases were generated just for learning and out of curiosity.

anton_lunyov · April 16, 2013, 1:12pm

Here you go:
http://pastebin.com/BhERPABY

nirajs · April 16, 2013, 4:29pm

thnks …

mugurelionut · April 16, 2013, 5:30pm

@argos: It’s only time consuming in the sense that you have to make many submissions in order to get the rand seed right. However, since there are only 40 test cases (and let’s say that each of them will get a different hash value) and you make, say, 10 submissions for each test cases (with 10 different rand seeds from which you pick the best one), you will get a good score in about 400 submissions. Of course, you will have to make some extra submissions in order to find the hash values of the test cases (e.g. even if your hash takes values from 0 to 999, only at most 40 of them are used).

anton_lunyov · April 17, 2013, 1:55pm

Thanks for pointing this out. It is fixed.

rahul_nexus · April 18, 2013, 5:45pm

Can somebody elaborate on the 4th solution given in the editorial which contains the use of Gaussian elimination and the use of elementary transformations.

mangatmodi · April 24, 2013, 6:03pm

I didn’t get the gaussian elimination method. Here servers and chunks are denoted along rows and columns respectively, then to convert 1 into 0 they Xor two columns. But isn’t it conceptually wrong to Xor 2 different chunks? Remember that column operations are invalid in Gauss Elimination for solving linear equation.

Here they have assumed servers to be the linear combinations of chunks, which may have 1 or 0 as coefficient, But isn’t it chunks which are linear combinations of servers?

anton_lunyov · April 24, 2013, 8:17pm

@rahul_nexus @mangatmodi
I’ve updated the editorial.
I hope it becomes more clear now.

mangatmodi · April 24, 2013, 11:49pm

@anton
Thanks It is much clear now. I was confused because of column operation, as it is illogical to xor two different chunks. However I got it now. I just remembered that column rank = row rank. So, basically we are avoiding row operations as it wouldn’t give ans, but decreasing column rank by column operations followed by some rows deletion.