Invitation to CodeChef April Long Challenge 2018!

One query left Does comments on question during live contest require approval before they are made public ?

And for your activeness I mentioned was of after the contest. Which I’m see today after 3 pm. I never knew there were these many comments during a live contest. And how much hard it is to reply these. May be probably due to I rarely opened comments sections of question. Whenever I opened I only saw few comments. :slight_smile: Because if there is error then soon we will have an updated question.

Yes, they require approval to be public, else people paste all sorts of code and ideone links. I once decided tog et them all disqualified- but later felt it would be too harsh to those who are new. Perhaps they didnt bother to read rules.

Yeah, I tried to answer as many comments as I can xD. For updated versions also, good thanks to @mgch , Misha is one of the best people out there :slight_smile:

1 Like

No I wasn’t suggesting that the test case should be different for different people. That would make it far too random. You definitely need the same test cases for everyone, but no information about them should leak out. That way the problem from the programmer’s point of view is to get the best result on a random instance (because he or she knows nothing about the test case, so it is effectively a random instance from their point of view).

I didn’t mean to suggest that “at most 20 will be considered for the leaderboard”. Sorry if I wasn’t clear.

I was suggesting that when you submit a challenge problem solution you should have an extra option called “receive return code from hidden instances”. You are allowed to select this option at most (say) 20 times during the competition. When you select this option, if you get an AC it means you can be sure that your program worked for the hidden TCs.

The reason for restricting return codes like this is that the mechanism for information leaking back to the user is via the ret code…

I get it now. Thanks :slight_smile:

In addition, the time and memory information that you get back should only be for the “feedback-instances”.

And I think it is also probably better not to include the feedback-instances in the final score because too much is known about them. (Another reason: if you do include them, as happens now, then this makes @algmyr’s suggestion not work properly. That is, even if only the last submitted program is scored in the final ranking, you still have an incentive to keep resubmitting a random algorithm until you get a good visible score, even though you can only see part of your score.)

Personally I think one solution would be to completely separate provisional tests from the actual tests. Provisional tests will only give a temporary hint of the performance of programs, but is not included in the actual set of tests. No information should be given on the hidden tests. Since the data generation algorithm is given you can easily test performance on your own system, and the provisional tests should catch most server side stuff.

Combined with my earlier comment about only judging the last submission this should both prevent reverse engineering and discourage spam submissions.

One more suggestion I have that during long contest wee can have some system reserved(kind of). That they will give priority only to challenge questions and the other way round. Because a person who is submitting around ~60 submission really didn’t care about output/points of problems. Just he want to submit same questions with as many random questions as he can. So, some system can give priority to non challenge problems. And other problems soln verdict can be given without much delay.

I would be happy with that option (no feedback from actual tests at all), but I got the impression people wanted a bit of certainty that their programs would still work with the actual tests, so I made the above suggestion (20 return codes) as a compromise. But your suggestion has the virtue of simplicity, and as you say it’s unlikely a program that passes the provisional tests would fail to complete the actual tests (though you could just about imagine that it runs it 3.96 seconds on the provisional tests, but TLEs at 4.01 seconds on the actual tests due to a slight difference in the data).

Judging on the last submission only is tempting, though it might make the comparison a bit more random because the tail of the score distribution obtained from a random algorithm (which you get from maxing over lots of attempts) probably has less variance than a single instance.

But this could be fixed by increasing the number of hidden test cases. And this wouldn’t require extra server time compared to what happens now because the server would only need to run a single entry, not all 200. (Though it may delay scoring slightly after the contest if they aren’t being run pre-emptively.)

1 Like

This is to inform you that all the Setter’s, Tester’s and Editorialist’s solutions are successfully linked to the editorials, except for CUTPLANT.(I am in talks over this delay for that editorial, we will look into that).

The editorialist solution is commented most (I think all :stuck_out_tongue: ) the time, so you can refer to that in case you have any further doubt. Hope you enjoyed the editorials as much as (or preferably more :p) than the problems. In case there is any further issue accessing solution of any other editorial, except CUTPLANT, do ping me here, we will look into that. With that, I would like to conclude the final announcement for this long with three magical words…

Click to view

PREPARE FOR COOKOFF

1 Like

@algmyr, that’s a very interesting suggestion. We will discuss about this. Thanks!

@admin Also check out my comment under https://discuss.codechef.com/questions/125323/invitation-to-codechef-april-long-challenge-2018/125673, it expands the idea into a more complete solution, both regarding reverse engineering and spam submissions.

@algmyr what is the sense of having provisional tests there? You can optimize the solution for it and receive overfitting(as saying in ML) and in the end, your time will be wasted cause final tests will be completely different. It almost has no sense of checking the solutions in the contest, am I wrong? I have another suggestion: what if we’ll try to use multitests in the challenge(around 50-1000 per test case, different types are combined) and testing will be provided only on 5-10% of data. I guess it will be hard for unfair solutions to get the test data. What do you think about that?

2 Likes

Thank you for this informative discussion :slight_smile: We will definitely take these points into consideration while figuring out what to do. We’ll get back to you soon.

Yes, solutions which have T=1 are far more prone. With mixed solutions of different kinds, I think we can minimize the issue by a good factor.

1 Like

@mgch Provisional tests would be there only to give a rough indicator of how you stack up. If you have a solution that performs consistently it will also be a decent estimate of your final score, similar to what the visible test is today. Even today you have no idea if the hidden tests are vastly different, you pretty much presume that the visible test is representative already. Also, importantly, you are given the data generation algorithm so that you can generate your own test cases to benchmark your program to see that if performs well in general.

@mgch If you’re worried that the provisional test cases are not representative you could always add a few more cases of each type to reduce impact of potential outliers. If the final tests are run after the competition (and only on the final submission) this would still be less computationally intensive than running the full test suite on every submission as it’s done today (from what I’ve understood). What I fundamentally would like to enforce is a separation between sets of test cases so you can’t gather information on the point giving tests during the competition.

1 Like

I should correct something I said above. It’s not just the return code that leaks information: another mechanism is the reported memory usage. If you want to discover the number ‘x’ from a hidden test case you just allocate ‘x’ MB in your code then stop. The results page will then show you what ‘x’ was. (I didn’t realise that the memory usage from the results page included that of the hidden cases.)

So a simple change, even if nothing else is changed, would be to make the result page only report the time and memory usage from the provisional test cases, not the hidden test cases.

2 Likes

Just chiming in to the lovely discussion of @alexthelemon and @algmyr.

As I see it, the essence of the CodeChef tie-break problems is to design an efficient algorithm that finds approximate solution to an NP-hard problem. You are given the time and memory budgets, and also - which may not happen in real life - a nicely defined domain of the possible input values. Since a lot of approximation algorithms carry a significant amount of randomization techniques, it is appropriate to ask how the fairness of the judging process can be assured.

20 test cases is definitely too low of a number to have a 95% confidence that one algorithm is better than another. Furthermore, the current practice of taking the best result out of (potentially) 200 solutions favors the algorithms with high volatility of the results - which is kind of gambling.

Definitely, the first step is the clear separation of the provisional tests from the hidden/final ones. The purpose of the provisional tests is to give you some kind of calibration of your solution to the CodeChef system resources, and to give a quick test run. The final score should be calculated from the hidden tests generated for the final scoring only. Maybe a hundred test cases to give a good spread of the potential input values, and to reduce the element of darn luck.

The second step (which is more laborious to implement) is to give a contestant the choice which solution he/she considers as a final solution. Not the last solution, but the actual choice to define the final solution. We can even go one step further - not a single solution but multiple solutions. But out of multiple solutions, not the maximum score but the average score. Some people may prefer to gamble and choose a single final solution, some people may want to mitigate the risk (the downside) and choose multiple solutions (CodeChef may limit the number of final solutions by, say, 10). This way the final score calculation can still proceed at reasonable pace (more test cases but less solutions to check) and the judging as a whole appears more fair.

1 Like