Fixing challenge problems

msasha · April 18, 2013, 4:42pm

The problem

Challenge problems currently encourage questionable tricks such as submitting the same randomized solution repeatedly, hoping for a better seed, or, like winger did in APRIL13, find good seeds for different inputs and reuse them. While this definitely is a neat trick, it’s merely an exploit of the way CodeChef scores solutions, rather than an actual improvement of the algorithm.

Additionally, I’m sure it places extra load on the servers (again, with no real benefit). One could even automate the process of continuously submitting solutions to find better seeds. Imagine what will happen when all the participants of a contest are running such a script. Going even further, it’s not impossible to extract the actual input (as opposed to merely hashes in winger’s solution) to CodeChef’s problems. If you do that, you can run a solver offline for as long as you want and come up with (possibly) optimal solutions. Clearly this is not the purpose of the contest.

The solution

Provide one set of inputs that is used to score solutions during the contest. Then, once the contest is over, all the challenge solutions are run on another, different, set of inputs and the final score is the result of that 2nd run. This way it becomes pointless to tailor your solution to a particular set of inputs, and people will have to come up with better algorithms instead.

viaan · April 18, 2013, 5:31pm

Great idea!!!
It can really emphasize on better algorithms. But even this won’t be a full proof solution since codechef will definitely specify the rules of generating test cases and since everyone will know that it will be judged on another input so again there will be multiple solutions and those solutions will executed for those submissions and atleast one would give a CLOSE TO OPTIMAL solution
But i do not feel it is unfair to use random seeds. If u are not providing any points for the challenge problems during the contest, it will only discourage the coders to attempt the challenge problem.

One more thing, its from my point of view, plz do not quote anybody’s name/ solution / approach to justify your point. You cannot blame winger for using random seeds. He just used the rules to his favour and even that is a skill.

msasha · April 18, 2013, 5:47pm

Some comments regarding viaan’s post:

The 2nd set of inputs will only be run on a coder’s best-scoring (on the first set of inputs) solution. So submitting multiple solutions with different seeds will be pointless.
The score for one’s solution during the contest should still be shown. It just wouldn’t be used as the final, post-contest score. I realize that this will confuse some people, but that’s the price you have to pay.
I also don’t think it’s unfair to use randomized algorithms, or even random seeds. They’re a valid programming technique. I’m just saying that the specific way CodeChef currently scores challenge problems is somewhat problematic with regards to randomized algorithms.
I’m not blaming winger. I think what he did was a clever trick and I would have used it myself, had I thought about it. Referencing someone doesn’t necessarily mean blaming him.

mugurelionut · April 19, 2013, 1:14am

How about simply doing the same thing as in the Minesweeper problem from the March 13 Long Contest ? That problem was interactive and, although some parameters were fixed for each test case, the test case itself was not fixed (e.g. in the case of that problem, the locations of the mines were not fixed). In this case one could still try to submit the same randomized solution in the hope of getting better scores, but there would be no way of finding “good” random seeds (because the test cases are partially generated on the spot). At least the infrastructure is in place for supporting such interactive problems, so this may be an easier solution.

viaan · April 19, 2013, 9:34am

@msasha @mugurelionut
I completely agree with what @mugurelionut said. I second his opinion. The minesweeper problem was really interesting and covered all the issues that were brought up by @msasha. So it was completely fair and square. But having said that I guess we should leave it to the problem setter and tester since they are always under the pressure of time and these kind of problems require a lot of time to be tested and developed. If CodeChef can encourage these kind of problems it would be more appropriate. Such interactive problems like the minesweeper one in MARCH13 will really spice up the challenge. So a really great idea.

@msasha I know referencing is not blaming. I felt you were blaming winger for taking advantage of the loopholes in the point system. So I apologize for my above comment. No hard feelings. Anyway I must congratulate and applaud you for bringing a very good and important topic for discussion.
Hope the problem testers & setters and the admin join this thread for discussion and share their opinions on this.

anton_lunyov · April 19, 2013, 2:57pm

@mugurelionut
If the problem is not interactive it seems impossible under the provided interface for the problem creation to make test cases variable. In MINESWPR we put the generator of test case directly in the judge and actually each input file just contains a fixed seed to generate input. So in the judge we at first generate the actual fixed test case and then we use time() as a seed to make some transformation of the greed in each particular run.

@msasha
Codechef has already did this after-rejudge at November 2010 Long Challenge. That time the last accepted run was rejudged. IMHO it is better then rejudging of the best run since then user can select which exact submission he wants to be his final one. Important thing was to enlarge TL a bit to avoid TLE instead of AC for the only judged submission. Before deciding this there were a lot of complains regarding this after-rejudge. Actually after-rejudge was considered as one of the possible scenario for MINESWPR as well (since test data were quite small and using about 10000 submissions it is possible to recover the whole test data :D). But then we realize that variable test data would be better.

It seems that making after-rejudge by taking last AC run + increasing TL by 10% will be a good scenario.