My score during the contest (the winning score) was approx. 9461.81, but I optimized my solution even further afterwards and reached a score of approx. 9480.06 (I was keeping this last optimization idea in case I needed it during the contest, but in the end it wasnât necessary to implement it then). I will explain just the fully optimized solution (the changes from my contest solution are quite small).
First of all I should mention that my solution is fully deterministic (no randomization involved) and the score differences come from the fact that the test data is randomly generated at each run. The only candidate strings I used in my solution are composed of the letters a, b and c - imagine them as base 3 numbers, where a=0, b=1 and c=2. Actually, I only use strings of length 12 corresponding to consecutive base 3 numbers (the most significant position is the first one; the least significant position - the last one). However, I do not consider these base 3 numbers starting from 0, but rather starting from a fixed offset (I used offsets like 29, 77, 25, etc. for various classes of test cases in my solution). At some point during the contest I also tried random strings and other bases and offsets, but the score differences were quite large - so I can say that a good part of my score is due to the candidate strings I used, but I have no idea why these strings lead to significantly better scores than others (and these betters scores are consistent, almost independent from the actual test data, meaning that there might be some intrinsic property of these strings which makes them work).
Ok⌠now letâs get to the actual algorithm. First of all, we want to find the honeypots in order to ignore them afterwards. I started by asking a base question (a question is a full query, meaning that a string is asked for each of the N servers and an answer is read back) - let its answer be Sbase.
Then I grouped the servers into N-H (almost) equally-sized blocks (I also tried other numbers of blocks, but this seemed to work better). Then I considered each block at a time. Letâs assume that the current block consists of G servers. I ask a question by modifying the strings of only the servers in the block (i.e. incrementing their base 3 number by 1 compared to the base question) and read the answer S back. If S=Sbase then I assumed that all the servers in the block are honeypot servers. If S != Sbase then I considered each server in the block independently. Let the servers be Se(1), Se(2), âŚ, Se(G). Letâs assume that we reached the server Se(i). I ask a question consisting of the base strings, except that the string of Se(i) is set to the base 3 number from the initial question for the block. Let S2 be the answer. If S2=Sbase then I assumed Se(i) is a honeypot server and moved on. If S2!=Sbase then I stored this âguessâ somewhere and marked Se(i) as not being a honeypot server. Moreover, we know now that the server Se(i) contributes with a value S2-Sbase to the answer S, so we decrement S by (S2-Sbase). If the new updated value of S is Sbase then we can stop and all the servers Se(i+1), âŚ, Se(G) are honeypot servers. If we reach the server Se(G) then we do not need to ask a question for it - the current value of S would be the answer to the asked question, so we have a âguessâ for Se(G).
After this step all the honeypot servers are identified and we have an extra âguessâ (compared to the base question) for each non-honeypot server. The total number of asked questions for each block is between 1 and G - thus, the total number of questions asked so far is at most N (but it can be significantly smaller than N) - since T>N we are sure to have enough questions available so far.
The next step consists of estimating the initial number of guessed bits for each non-honeypot server. For each of these servers we have two different values now: Sbase and Sguess. Sguess-Sbase is equal to the difference y^2 - x^2, where x is the initial number of guessed bits (in the base question) and y is the number of guessed bits in the second question. If there is only one possible value of x for which a possible value y exists, then we found with total certainty the initial number of guessed bits for this server. In general x may not be uniquely determined. We will ask further questions for the non-honeypot servers for which the initial number of bits is not known with certainty: we will stop asking questions for a non-honeypot server i when the set of guesses uniquely determines the initial number of bits for the server i.
Because we have a limited number of questions, we will prioritize the servers based on the most likely number of maximum number of bits guessed so far (even if we have multiple possible initial numbers of bits, the one closest to the initial average number of bits is considered more likely) and the number of questions asked for this server so far (lower numbers indicate higher priority).
Now we know the initial number of bits for each non-honeypot server and we also have a âbest number of bitsâ for each of them (i.e. the maximum number of bits guessed so far, either in the base questions or in the other questions we asked).
The next step consists of improving the guessed number of bits for the non-honeypot servers. The main idea is to try not to waste questions. For instance, if we have two non-honeypot servers, one for which we guessed 80 bits (at best) and the other for which we guessed 100 bits (at best), it is much more likely to find an improvement (more guessed bits) for the server with 80 guessed bits (i.e. a question regarding this server is much more likely to improve our score instead of a question concerning the other server).
Thus, as long as we have questions left, we will choose the K servers (K <= KMAX) with the smallest number of maximum number of guessed bits (in case of ties, the servers with the smallest number of questions asked so far are chosen). We use the base strings, except that for each of the K servers we use their next base 3 numbers, and we ask the question. Let the answer be S. Since we know the initial number of bits guessed for each server, we will know the sum of the squares of the numbers of bits of the K servers in the current answer - let this number be Sum. By itself, Sum doesnât help us find the number of guessed bits for each of the K servers Se(1), âŚ, Se(K), because Sum can be obtained in multiple ways as a sum of K squares between 0 and 160. Let vmax[K][Sum] be the maximum number of bits x such that Sum can be obtained as a sum of K squares of numbers from 0 to 160 and one of them is x. If vmax[K][Sum] <= maximum number of guessed bits for Se(1) then we do not need to ask any further questions, because there will be no improvement for any of the K servers. Otherwise, we will ask a question in which only Se(1)'s string is changed compared to the base, thus finding the number of guessed bits for Se(1). After this we update Sum and compare vmax[K-1][Sum] to the maximum number of guessed bits for Se(2), and so on⌠Note that the servers were ordered in increasing order of their priority (i.e. Se(1) has the smallest number of maximum number of guessed bits so far, etc.). You can easily notice that when K servers are selected, at least 1 and at most K questions are asked - except for the first question, all the other questions are asked only if they have a good probability of improving the maximum number of guessed bits for one of the servers. I used KMAX=3 (I tried other values of KMAX, from 1 to 10, but 3 produced the best results).
Finally, because I mentioned âgood probabilityâ, note that it is unlikely for the number of bits to be too small or too high. Thus, when computing vmax[K][Sum] I only considered squares of numbers between BMIN and BMAX (I used BMIN=69 and BMAX=106, instead of 0 and 160). This made the strategy skip questions which are very unlikely to improve the score. I also used similar lower and upper bounds when trying to guess the initial number of bits for each non-honeypot server.
Ok⌠actually, for K=1 I considered the whole range of bits, from 0 to 160 (I didnât want to guess the full 160 bits and ignore the answer because it was outside the range [BMIN,BMAX] ).
At the end we ask a final question with the string producing the maximum number of bits for each server - this last question is the one producing the maximum score for the test case.
In order to âsqueezeâ the best score possible out of my solution I tried different parameters for various types of tests (first of all, I guessed the T value for 20 test cases, but this took a long time ; so for the rest I simply used some ranges I came up with) - the idea was to simply differentiate between the tests as good as possible in order to be able to tune the parameters independently for each group of tests (ideally, each group should consist of just 1 test). In the end, the only parameter that I tuned was the initial offset for the base 3 numbers.