Notice that we can sort the boxes by the number of stones in them increasingly and assign energy to boxes accordingly. This is due : https://en.wikipedia.org/wiki/Rearrangement_inequality. This is true because the number of boxes which have more stones than certain box at index i decreases as i increases, and such rearrangement yields minimal energy.
Now, we will use dynamic programming.
Let us denote by dp[N][M] the minimal energy we can obtain by putting M stones into N boxes whereas each box has atleast 1 stone in it. Now, either each box will have 2 or more stones in it and taking one stone from each box yields dp[N][M-N], or taking one stone from each box yields a certain number of boxes with 0 stones in it, and the rest of the boxes with atleast 1 stone in them. Here is the code and function which builds dp[N][M] in O(N^2M) which passes for 20pts and TLE’s for 50pts : 20pts. First we sorted energies, calculated energy prefix sums, which we used for cases when taking one stone from each box leaves some boxes with 0 stones in function.
Now, we can notice by the nature of dynamic recurrence i.e. dp[N][M] is based only on dp[i][M-N], where i goes from 1..n, that not all fields in dp[N][M] matrix will be called for finding final solution. Then one writes same dp recurrence recursively and gets 50pts. Here is the code: 50pts. We are still using NM matrix which is inefficient in memory for 100pts.
Key observation for speeding up the solution is in this line of code:
ret = minll(ret, (prefsum[N-n-1+k]-prefsum[N-n-1])*(n-k)+f(n-k, m-n));
That is, if (prefsum[N-n-1+k]-prefsum[N-n-1])*(n-k) is already bigger than ret we need not call f(n-k, m-n).
This speeds up the solution significantly due to the fact that (prefsum[N-n-1+k]-prefsum[N-n-1])*(n-k) tends to be big, especially if E[i]>=7. Still, checking if the previous expression is less than ret yields TLE for most cases, because we will still iterate over all i in 1..N-1 in recursion.
Finally for 100pts, we can notice that function f(i)=(\sum_0^i E[i])(N-i) is bitonic, i.e. it is first increasing then decreasing, for sorted array E. Using that we can iterate only over first few indexes, and last few indexes in recursion while the expression is less than ret. Also, we will use C++ map for storing previously calculated results. Here is the code which passes in 0.05s:100pts