My solution is as follows: At every step I consider all the 6 permutations of the 3 pieces (or fewer if some of the 3 pieces are identical) and use beam search in order to find the best sequence of moves. I can use different beam widths for each piece (e.g. a larger width for the 1st piece and a smaller width for the last one). The scoring function for evaluating the boards considers several features:
-
number of “corners” (i.e. points in between grid cells which have an odd number of occupied cells around them, or two diagonally-opposite occupied cells) - this is somewhat similar to @anta0’s 2x2 patterns, except that I add all the corners together and assign the same weight to all of them
-
an approximation of the number of 3x3 pieces which I can place on the board: for every row R I consider all the maximal Ks such that a 3xK empty rectangle exists on the board, having R as its top row: for each such K I count K/3 such pieces (they may overlap with pieces counted on other rows, but that’s fine)
-
the number of 5x1 and 1x5 pieces which I can place on the board
-
the number of occupied cells on the board
I combine these features linearly (each has different weights). Plus, if there are too few 5x1 or 1x5 pieces or too few 3x3 pieces, I apply some penalties (this essentially increases the weight of the first such pieces). And the same for the number of occupied cells. Up to a threshold I use a weight, above it I use a higher weight. However, I chose the weights manually. I wanted to choose them automatically, but I was worried that they wouldn’t necessarily lead to the best solutions on the official test cases because of the small number of test cases and of the unknown probability distribution of the pieces.
When placing the next piece I consider only valid positions which also “touch” the edges of the board or other pieces.
This solution is not bad by itself, but a simple addition gave it a significant boost (on average). For each permutation of pieces I also simulate the addition of a 4th piece, of type 3x3. So I essentially pick the sequence of moves which leads to the best score after placing, in addition, also a 3x3 piece after them (if placing such a piece is possible).
Some issues that I had with this solution:
-
even with large beam widths, it’s not always capable of doing all the 50000 moves (it gets stuck at various points along the way) - e.g. with some beam widths of (10, 5, 3, 1) (which are already too large to execute all the moves in time), I could get 20% of the test cases running to 50000 moves, but only ~23000 moves (until stuck) on average
-
with large beam widths it is too slow so it will not do 50000 moves because of the strict time limit
-
with very small beam widths (<= 2) the solution is very bad (it gets stuck quickly, so it doesn’t execute too many moves)
-
with “reasonable” beam widths the scores have a huge variance (from executing the full 50000 moves to executing only 1000-2000)
The best trade-off was to use beam widths of 3, with heavy code optimizations (e.g. bit operations; some compiler-friendly optimizations - e.g. if you have an “if (A) { … } else { … }” and the “else” branch is more likely to be executed, then change this to “if (!A) { … } else { … }”; etc.) + pruning (cutting the scoring as soon as possible if the current state is worse than the worst state in the beam) + computing the features form scratch only on the parts of the board which were modified by the move (usually a much smaller area than the whole board). At best, on one of the 20% test cases, my code could score around 635K points before reaching the time limit. Given that with the full 50000 moves I usually scored around 670K-675K points on my local tests, this means that I got pretty close to using up all the moves on this test case.
The extra 2 days of contest helped me because I had more time to optimize my code for speed (so my improvement in the last day was mostly faster code, which meant many more points on some of the 20% test data). But the lead I had on the 20% test data did not hold over the remaining 80% of the test data. Given the huge variance of my scores, I was somewhat expecting that. Still, I find it funny that my score and @anta0’s score ended up being so close (approx. 0.1% difference, i.e. around 5000-6000 points in absolute value). As a fun fact, in what ended up being my best scoring solution, I used a time limit of 4.86 sec. In other solutions I used a time limit of 4.90 sec - just this small difference was giving me ~10000 additional points on just the 20% test data (so maybe even more on the full test data), which is more than the difference between mine and @anta0’s score However, the solutions that I submitted with the higher time limit also contained other changes, which essentially made them get the 10000 additional points on the 20% test data, but lose more than that on the rest of the data.
Finally, I would like to say that I don’t think this was a good challenge problem, for three main reasons:
-
I believe all the solutions had very large score variance - essentially, if one gets stuck for some reason, it cannot continue at all, losing all the remaining moves (which sometimes can be a huge number, sometimes not) ; it would have been better to just penalize the score somehow when a piece cannot be placed and allow the solution to continue with the next move after taking the penalty (or the solution could decide if to stop or to take the penalty and continue)
-
The probability distribution of the pieces was unknown. This means we cannot generate relevant local test cases and we cannot estimate the behavior on the official test cases - everything becomes very random in this case.
-
For every probability distribution, one would need many test cases in order to properly test a solution. For instance, even if only the uniform distribution were used, 20 test cases would be too few to properly evaluate the average behavior on this distribution (because of the large score variance issue I mentioned). But when you use multiple such distributions, this essentially means that each distribution is tested on very few test cases, making the results even more random.