PROBLEM LINK:
Setter: Constantinescu Andrei Costin
Tester: Misha Chorniy
Editorialist: Taranpreet Singh
DIFFICULTY:
Medium
PREREQUISITES:
Greedy, Understanding of Activity Selection problem, Dynamic Programming, and Combinatorics.
PROBLEM:
Given four integers N, M, K and MOD, determine the number of ways to select M intervals (possibly intersecting) in the range [1, N] \bmod MOD such that the maximum size of disjoint intervals set we can select is K.
SUPER QUICK EXPLANATION
- For calculating answer for (N, M, K), we need to know answers for tuples like (N-A, M-B, K-C) for valid values of A, B and C, leading to overlapping subproblems as well as optimal substructure property, hinting toward Dynamic Programming Solution.
- Let us represent dp[n][m][k] as the number of ways to select m intervals in the range [1, n] such that the maximum size of disjoint intervals set we can select is k AND the last selected interval ends at N.
- For transitions, we need to use combinatorics to move from current position to next Position, Counting the number of intervals which start after current position but before or on next Position, and end on Positions on or after nextPosition.
- The answer for our problem will be \sum dp[i][M][K] \forall i \in [1, N].
EXPLANATION
Let us consider the reverse of this problem first, the problem to select the maximum number of disjoint intervals out of given intervals.
We can see, that the idea was to sort the given intervals in the non-decreasing order of right end of intervals and try to select an interval whenever possible. It will be possible to select the next interval if the right end of the previously selected interval is to the right of the left end of the current interval. Since intervals are ordered by right end, it guarantees to select the interval with leftmost right end whenever possible, leading to the selection of maximum disjoint intervals.
For more details, refer this.
Coming back problem, Let us define the concept canonical sequence of disjoint intervals. See, the way we selected intervals, we can see, that the intervals have been selected in the increasing order of right end.
We can see, that there might exist multiple sets of intervals having the same size, or just the same set of intervals selected in different order.
So, to tackle this, Let us define the concept of the canonical sequence of this set of intervals as the right ends of selected intervals. Since we can see, that r_1 < r_2 < r_3 \dots < r_x. The best thing of this sequence is, that it remains the same for a given set of intervals since we see, there is no randomness involved in the greedy solution we use to find the solution. This means, that if we select the intervals in increasing order of right end, we don’t need to worry about multiple sets and orderings.
Suppose we have X intervals sorted by the right end in increasing order, and assuming we can select Y disjoint intervals at max. Now, suppose we start deleting intervals from right to left one by one, till the maximum number of disjoint intervals we can select is Y-1. This way, calculating Y in that case becomes a smaller sub-problem, that is, to calculate Y-1 from a smaller set of intervals, and add 1.
Since we see that we need to solve the same subproblems multiple times, we resort to our old friend, Dynamic Programming.
Let us define dp[n][m][k] as the number of ways to select m intervals in range [1, N] such that the length of the canonical sequence is k, and the last element of canonical sequence is n, that is, the last interval included in the canonical sequence ends at n.
We can see, that due to condition "the last interval included in canonical sequence ends at n", final answer is the summation \sum dp[i][m][k] \forall i \in [1, N].
Let us move toward State transitions. For convenience, let us call intervals present in canonical sequence as special.
See, suppose we have calculated the number of ways for dp[pos][selected][special]. We will try to count ways to add one more special interval for all combinations of (Right end of next interval, Number of Additional intervals selected), writing it as (nextPos, more).
See, the criteria for the next interval to be included in the canonical sequence is, that the interval should start after pos. Also, the start point has to be before or at nextPos since the right end of this interval is nextPos, according to our definition of DP state. We want to select a total of more intervals.
But, Let us count the number of intervals which start in the range (x, y] and ends in the range (y, z]. Since for every start point, it is possible to choose any right end in a given range. Number of start positions is (y-x) while number of ending positions is (z-y), Leading to a total of (y-x)*(z-y) intervals. Suppose, it is also possible to select y as the right end, then the number of intervals would become (y-x)*(z-y+1). Understanding this will be helpful.
Now, See, need to select more intervals such that all intervals have a common point so that exactly one interval gets selected in canonical sequence. And that common point we have is pos since our greedy algorithm chooses the interval with the leftmost right end. The number of ways to select more intervals out of total A = (nextPos-pos)*(N-nextPos+1) intervals is given by C(A, more).
But, it will also include the case where all intervals end after nextPos. It can be easily seen that number of such intervals starting after pos and before or at nextPos, and ending after nextPos is B = (nextPos-pos)*(N-nextPos), out of which we have counted the number of ways to select more intervals. This is given by C(B, more).
Hence, Number of ways to select more intervals which start after pos but before or at nextPos, and end at or after nextPos, while having at least one interval ending at nextPos, is C(A, more) - C(B, more).
Now, After understanding all this, the problem left is to take the summation
dp[nxtPos][total+more][selected+1] = \sum dp[pos][total][selected]*(C(A, more)-C(B, more)) \forall pos \leq nxtPos
Implementation of this problem is really simple. We can precompute the whole Combination table which will allow constant time lookup and is usually recommended because it uses only addition and modulus operation due to recurrence C(N, R) = C(N-1, R)+C(N-1, R-1).
Read more about calculating C(N, R) \bmod P here.
Time Complexity
Time complexity is O(N^5) per test case theoretically, but it runs much faster in practice.
AUTHOR’S AND TESTER’S SOLUTIONS:
Setter’s solution
Tester’s solution
Editorialist’s solution
Feel free to Share your approach, If it differs. Suggestions are always welcomed.