My guess is that the problem lies in the scoring method and the selection of test cases and not in the rejudge. This problem failed on both fronts:
Getting lucky with some random choice of elements on some small test case can give you a huge advantage while pushing everyone else to 0. And you can’t make up this difference on other test cases because I assume the score is summed over test cases and only then evaluated relative to the best sum. Instead, you should use relative scores on individual cases and use an average or sum of these relative scores as the final score.
How is a secret test case generation method still an option? It has been demonstrated several times that it’s bad. I would assume that the sizes of input data would be randomly and uniformly distributed over the given range. Or will there be mostly small cases where I can score huge points? Perhaps there won’t be any small cases and I’m wasting my effort. It’s a guessing game that noone likes.
Just look at some submission that scored a huge number of points and then check the ones submitted by the same contestant just before and after. They are basically the same but give insanely different number of points, which clearly illustrates the influence of getting lucky with the random seed. The problem’s scoring method and it’s test data make this luck factor very important.