LMATRIX3 - Editorial

darkshadows · April 21, 2014, 9:02am

PROBLEM LINK:

Author: Tiancheng Lou
Tester: Shiplu Hawlader and Mahbubul Hasan

DIFFICULTY:

HARD

EXPLANATION(as written by ACRush):

Step 1 :

Reduce to : given a list of numbers 1, 2, … p-1, split the list into as many as sub-list(s), such that the sum within each list is a multiple of p.

Proof :
Suppose the array is A_i, where 1<=i<=n. Denote A₀ = A_n+1=0, and B_j = A_j - A_j-1 where 1<=j<=n+1.

*) A_i are all zero if and only if B_i are all zero.
Given a list of operations, we construct a graph of n+1 nodes. Such that, for each operation (s, t, k), we add one edge between s and t+1. So,
*) the sum of B_i within each connected component is a multiple of p.
*) for each component consists of c nodes, we can use c-1 OPs to zero it.
*) the answer is n+1 - max number of components whose sum is a multiple of p.

Step 2 :

Use p=10 as example, we always use 19, 28, 37, 46, 55 if possible. Since there must be one solution with the same number of moves but more 2-digits moves.

Step 3 :

If p is odd, and the number of p/2 is also odd, try all possible subset containing 5.

Step 4 :

There are only 4 digits left. #### IMPORTANT : the number of possible subsets is very limited. Even if p = 10, the number of subsets is ~30. Then, we need to formalize the problem into a Integer Linear programming problem.
There are many different ways to handle ILP problem.
*) mugurelionut@ and kutengine@ are using simplex for ILP, which required int128 and “branch and bound” algorithm. Though I’m not sure their solutions are 100% correct, hard to generate corner cases for them.
*) aawisong@ and the standard algorithm : precompute all simplex points, and use gauss equations to solve the rest of them.

AUTHOR’S AND TESTER’S SOLUTIONS:

To be updated soon.

mugurelionut · April 21, 2014, 3:33pm

My solution also started with the Step 1 presented in the editorial. Let’s define the difference between adjacent elements as A[j]-A[j-1] (modulo P) for 1<=j<=n+1.

One operation which adds k to an interval of the sequence modifies exactly two differences between adjacent numbers: one of them increases by k and the other one decreases by k. Now we can conceptually split our set of differences into differences which are only decreased and differences which are only increased (it never makes sense to both decrease at some operation and then increase at another operation the difference between the same two adjacent positions - but it is possible for some difference value d to be decreased at some positions where it occurs and increased at others).

After this conceptual split, we need to write each difference d which is decreased as a linear equation: d = c(P-1) * 1 + c(P-2) * 2 + … + c(P-1) * 1, meaning that we use c(1) differences equal to 1, c(2) differences equal to 2, …, c(P-1) differences equal to P-1 which will all be “increased” up to P (i.e. turned to zero). Each such “increase” operations also decreases the difference d by the same amount and, in the end, the difference d also becomes zero.

The point is that for the differences which are decreased we do not need to perform extra operations, so we would like to maximize their number (the more such differences we have, the more operations we save).

For each possible difference d we may have multiple equations as the one described above - each operation is defined by its coefficients c(1), …, c(P-1). Each such equation will be considered as a variable taking values >=0 (the value of the “variable” means how many times we use the equation for increasing/decreasing differences). There can be many such equations, but we can restrict ourselves to only “basic” equations - i.e. equations for which it is impossible to obtain a zero sum (modulo P) by decreasing some of the coefficients. For P=10 there are 433 such “basic” equations (including the obvious ones, e.g. 1 = c(9) * 1, 2 = c(8) * 2, etc.).

Now we have a linear program with P-1 inequalities and a number of variables equal to the number of “basic” equations (at most 433 for P=10) and we want to maximize the sum of the values associated to the variables under the constraints that the variables are non-negative integers and that all the inequalities are satisfied. Note that maximizing the sum of these variables means maximizing the number of operations we save (i.e. we don’t need to perform).

If we ignore the integer constraints of the linear program, the Simplex algorithm is the tool of choice. But in this case the solution may be real, as well as the values of the variables. There are techniques for solving ILP (integer linear programs) by repeatedly using solutions to “normal” (i.e. real) linear programs, but they may have ended up being too slow (although I did try to implement some of them).

In the end, what helped me solve the problem was the following observation. Let’s assume that the maximum sum of values obtained by the real linear program is something like 4134.678. Then it is obvious that the solution of the ILP is at most equal to the integer part of the real solution (i.e. 4134 in this example). What I noticed is that in all the cases I tried I could always eventually reach this optimal solution. So my solution simply computes the solution of the linear program and then takes its integer part - there were some problems here, too, due to precision issues. I initially used decimal128 as a high precision real type, but this was too slow (it only passed 12 test cases before TLE-ing). In the end I was able to use only long doubles within the Simplex algorithm and use decimal128 only to compute the high precision result in the end.

djdolls · April 22, 2014, 4:12pm

In my approach step 1-3 are exactly the same as described in the editorial. For the last step, I was trying to solve ILP using LP and branch and bound, but my implementation was too slow and probably buggy. I finally used a rule-based method, which seems good, but I don’t have any formal proof for its correctness.

Let us say that P = 10, and after all reductions we ended up with a bunch of 1s, 2s, 3s and 4s which we want to partition into maximum number of subsets such that each subset has a zero sum modulo P. There are at most 30 “prime” subsets with zero sum (a prime subset is the one which has no proper subset of zero sum). The idea is to partition the 1s, 2s, 3s and 4s into these subsets randomly, and then optimize the partition using a set of rules. A typical rule could be like this:

(1, 3, 4, 4, 4, 4) + (1, 2, 2, 2, 3) --> (2, 4, 4) + (1, 2, 3, 4) + (1, 2, 3, 4)

This means that if in our partition, if we are using both (1, 3, 4, 4, 4, 4) and (1, 2, 2, 2, 3), then we can replace them by the three subsets: (2, 4, 4), and 2 copies of (1, 2, 3, 4), which will gain us an extra subset in partition. A rule is a “prime” rule, if it cannot be split into two rules. For example, the following is not a prime rule

(1, 3, 4, 4, 4, 4) + (1, 2, 2, 2, 3) + (2, 3, 3, 3, 3, 3, 3) + (2, 2, 2, 2, 2) --> (2, 4, 4) + 2 (1, 2, 3, 4) + 3 (2, 2, 3, 3)

as it can be split into
(1, 3, 4, 4, 4, 4) + (1, 2, 2, 2, 3) --> (2, 4, 4) + (1, 2, 3, 4) + (1, 2, 3, 4), and
(2, 3, 3, 3, 3, 3, 3) + (2, 2, 2, 2, 2) --> 3 (2, 2, 3, 3)

We find all “prime” rules (there are approximately 150 of them for P = 10 and 1, 2, 3, 4), and apply them on the partition whenever applicable. The reduction system is confluent, but not persistent, so we cannot guarantee the existence of a unique normal form. However, since the reduction system contains all prime rules, we can prove that all normal forms will have the same size, i.e., optimal with respect to partition size.

So now, the question is how to find the set of prime rules. First we compute optimal partition of sets where number of 1s, 2s, 3s and 4s are bounded by some constant (say M). This is basically bounded knapsack problem can can be solved using dynamic programming in O (kM⁴) time, where k is the number of “prime” subsets. We consider each of these optimal partitions, and try to extend them by adding an extra prime subset into it, if the new partition is not an optimal one, then we get a new rule. For example, the following is an optimal partition of (1, 1, 1, 3, 4, 4, 4, 4, 4, 4).

(4, 4, 4, 4, 4) + (1, 1, 1, 3, 4)

Now, if add a prime subset (1, 1, 1, 3, 4) into it, it longer remains an optimal partition, as the resulting set (1, 1, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4) can be partitioned into 4 subsets unlike the 3 subsets achieved by above extension. This gives us the following rule:

(4, 4, 4, 4, 4) + 2 (1, 1, 1, 3, 4) --> 3(1, 1, 4, 4) + (3, 3, 4)

Ideally, one would expect that if we increase the value of M, we will get more and more rules. However, surprisingly
the set of rules does not grow if we increase M beyond 12. This is the part for which I have NO explanation. I tried increasing M up to 50 and it did not find anything new, so I was pretty much convinced that my set of rules is complete. If anybody can prove a decent bound on the value of M, I would be very happy to discuss, as it seems like a nice way to handle semi-bounded knapsack problems (capacity of knapsacks is bounded, but number of allowed copies is not).

I first tried to pre-compute all rules, and put them in the file, but the 50000 bytes limit did not allow me to do that. Unfortunately, the computation of all optimal partitions of bounded sets was very time-consuming. In the end, I found that even if I consider only the lexicographically smallest optimal partition of sets to generate the set of rules, we can get almost all the rules (except a few, which I included in my source file).

mugurelionut · April 22, 2014, 10:55pm

My guess is that if for the last step (ILP) you would have taken the integer part of the real LP solution then you would have still got AC (assuming there would have been no precision issues).

djdolls · April 22, 2014, 11:52pm

yes, probably it would have worked if I had handled the precision issues properly. Funny, that during the contest, I was almost sure that you were using a rule-based approach, and from the large number of submissions you made, I was convinced that there is a tricky test case that requires a rule which is exposed only if we use a very large value of M.