PFRUIT - Editorial

PROBLEM LINK:

Div 1
Div 2
Practice

Author: Full name
Tester: Full name
Editorialist: Oleksandr Kulkov

DIFFICULTY:

HARD

PREREQUISITES:

Moment-generating functions, FFT, computing functions on polynomials

PROBLEM:

There are k chefs and m workers. For each worker we know integers a and b such that worker can pick from a to b fruits. There are n groups of workers, each group having same a and b. Each worker uniformly choose some number from his [a,b] and picks that amount of fruit.

If the total number of fruits picked during the day is S, then goodness of the day is F(S)=S^k. You have to find expected value of F modulo 998244353. Additionally we know that n\cdot k \leq 10^5.

QUICK EXPLANATION:

Bring all stuff you know about combinatorics and probability theory, add some stuff you don’t know, blend it in mixer until you get the solution.

EXPLANATION:

p=998244353 is prime number and p-1=2^{23} \cdot 7 \cdot 17, so there is root of unity of order 2^{23} modulo p and we can use this mod for fft which is strong hint here. Let’s try to get some convenient look for the answer.

Let’s say that each group is described by triple \{a_i,b_i,c_i\} where c_i is number of workers in the group. Let’s find moment-generating function of random variable S:

M_S(t) = \mathbb E[e^{tS}]=\sum\limits_{k=0}^\infty \dfrac{t^k\mathbb E(S^k)}{k!}

We will only need to get coefficient near t^k in expansion of M_S(t) and multiply it by k! to find an answer.

We should note that amounts of fruits picked by each worker are independent random variables, thus moment-generating function of S=\sum\limits_{i=1}^m s_i is product of moment-generating functions for each separate worker. Let’s find it for worker with given numbers a and b.

M_{s_i}(t)=\mathbb E[e^{ts_i}]=\dfrac{1}{b-a+1}\sum\limits_{k=a}^b e^{tk}=\dfrac{1}{b-a+1}\cdot \dfrac{e^{ta}-e^{t(b+1)}}{1-e^t}

Now we can say that for whole moment-generating function holds:

M_S(t) = \dfrac{1}{\prod_{i=1}^n(b_i-a_i+1)^{c_i}}\prod\limits_{i=1}^n\left[\dfrac{e^{ta_i}-e^{t(b_i+1)}}{1-e^t}\right]^{c_i}

Coefficient on the left is constant and can be calculated in advance. As for product on the right, we can efficiently calculate it by using polynomial \log and \exp, that is:

\prod\limits_{i=1}^n\left[\dfrac{e^{ta_i}-e^{t(b_i+1)}}{1-e^t}\right]^{c_i} = \exp\left(\sum\limits_{i=1}^n c_i \left[ta_i+\log(1-e^{t(b_i-a_i+1)})-\log(1-e^t)\right]\right)

For each summand you may explicitly calculate first k coefficients in expansions of 1-e^{t(b_i-a_i+1)} and 1-e^t using formula e^x = \sum\limits_{k=0}^\infty \dfrac{x^k}{k!}. After this you may perform \log and \exp as described in this and this articles. This will give you O(nk \log nk) solution. There are few things you should admit before using \log straight ahead. Since coefficient near x^0 must be evaluated manually, you would like it to be equal 0 because you work in modulo field. Argument of your \log operator must have coefficient 1 near x^0. To obtain this you should note following:

  1. Coefficient near x^0 is initially 0 for both numenator and denominator so you should divide both of them by x.
  2. Coefficient near x^0 of numenator after this won’t be equal to 1 so you should divide numenator by this value and multiply final answer by this value raised to power c_i.

ALTERNATIVE SOLUTION 1:

There is also simpler O(nk \log m \log k) solution not messing with \log and \exp but only counting fractions in multipliers using formula for polynomial inverses, which can be found in same articles as for \log and \exp and binary exponentiation to power c_i, though you need good constant to pass with such solution.

ALTERNATIVE SOLUTION 2:

This is the one I wrote at first. After that I realized that there is simpler approach which is at main part of editorial now.

To get expected value for S^k we should find M_S^{(k)}(0) since:

M_S(t) = \sum\limits_{n=0}^\infty \dfrac{t^n\mathbb E(S^n)}{n!} \implies M_S^{(k)}(t) = \sum\limits_{n=0}^{\infty} \dfrac{t^n\mathbb E(S^{k+n})}{n!} \implies M_S^{(k)}(0) = \mathbb E(S^k)
  1. Now let’s denote G_i(t) = (e^{ta_i}+\dots+e^{tb_i})^{c_i} and introduce auxiliary array (jump to 2. to understand why do we need them):
g_{ij} = G_i^{(j)}(0)=\left[\left(e^{ta_i}+\dots+e^{tb_i}\right)^{c_i}\right]^{(j)}(0) = c_i(c_i-1)\dots(c_i-j+1)(a_i^j+\dots+b_i^j)^{c_i-j}

This will help us to calculate derivatives. Main problem in calculating these values is sum on large segment of powers, i.e. a_i^j+\dots+b_i^j. We will start with following telescopic identity (see this for other ways of calculating these sums, like Faulhaber formula):

(n+1)^{k+1}-1 = \sum\limits_{m=1}^n[(m+1)^{k+1}-m^{k+1}]=\sum\limits_{p=0}^{k}\dbinom{k+1}{p} (1^p+\dots+n^p)

Let’s denote S_p=1^p+\dots+n^p and rewrite it in the following more convenient way:

\sum\limits_{p=0}^{k} \dfrac{S_p}{p!} \cdot \dfrac{1}{(k-p+1)!}= \dfrac{(n+1)^{k+1}-1}{(k+1)!}=C_{k}

If we now denote A_p = \dfrac{S_p}{p!} and B_{k-p}=\dfrac{1}{(k-p+1)!} we can say that:

\sum\limits_{p=0}^k A_p B_{k-p} = C_{k}

If we multiply both parts by x^k we will see that sequence C_k corresponds to multiplications of A_k and B_{k} as polynomials. Thus to recover A_k we have to find polynomial inverse for B_k and multiply C_k by it. Finding inverse for polynomial is routine procedure which can be done in O(n \log n) for first n coefficients using following formula:

Q_{k+1}=Q_k(2-PQ_k)

Here P is polynomial for which we find iverse and Q_k is polynomial having first \sim 2^k coefficients equal to ones of 1/P(x) series. You may check this or this resources to know how to calculate inverse series. This allows you to find A_k in O(n\log n) which in turns provides you with necessary S_p for all p from 0 to k and allows you to calculate all g_{ij} in O(nk \log k). Now that we have g_{ij} what did we calculate them for?
2. Answer is to find M_S^{(k)}(0). It will have constant coefficient of \frac{1}{\prod_{i=1}^n(b_i-a_i+1)^{c_i}} and the rest is k-th derivative of

\prod\limits_{i=1}^n \left(\sum\limits_{k=a_i}^{b_i}e^{tk}\right)^{c_i}=\prod\limits_{i=1}^n G_i(t)

in point 0. By general Leibniz rule it is equal to the following sum:

\sum\limits_{k_1+\dots+k_n=k}\dbinom{k}{k_1,\dots,k_n}G_1^{(k_1)}(0)\times\dots\times G_n^{(k_n)}(0)=
=k!\sum\limits_{k_1+\dots+k_n=k} \dfrac{g_{1k_1}}{k_1!} \times \dots \times \dfrac{g_{nk_n}}{k_n!}

Here after k! goes coefficient near x^k in product of n polynomials:

P_i(x) = \sum\limits_{j=0}^k \dfrac{g_{ij}}{j!} x^j

This product can be calculated in O(nk \log n \log k) with divide and conquer technique. Bringing all mentioned earlier stuff together finally allows us to solve the problem for good. You can also solve problem in O(nk \log nk) if you calculate \log of each single polynomial, then sum them up and then calculate \exp of the result. This is however much more complicated to implement. Please, refer to this or this articles mentioned before to learn about \log and \exp over polynomials.

NOTE ON POWER SUMS:

In alternative solution 2 you have to calculate S_p=1^p+2^p+\dots+n^p. There is more direct way to calculate them if you consider exponential generating function:

S(x) = \sum\limits_{p=0}^\infty \dfrac{S_p}{p!}x^p=\sum\limits_{k=1}^n \sum\limits_{p=0}^\infty \dfrac{k^px^p}{p!}=\sum\limits_{k=1}^n e^{kx}=\dfrac{e^x-e^{x(n+1)}}{1-e^x}

The fraction on the right is actually what we got in alternative solution 2 in closed form and you once again will need to reduce numenator and denominator by x and find inverse series of denominator, which still be O(k \log k).

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.

Additionally, the first solution is related to Bernoulli Numbers , and the alternative solution is related to Stirling numbers of the second kind. If you find the formulas above tough to read or understand, take a look at these numbers. Much of the formulas in the solution is actually equivalent to the identities of them.

Author’s (alternative) solution in a simpler way:

Calculate \sum\binom S0,\sum\binom S1,\dots,\sum\binom Sk instead of \sum S^k (Divided by \prod(b_i-a_i+1)^{c_i} in the end to get expectation). To write S^k in sums of \sum\binom S0,\sum\binom S1,\dots,\sum\binom Sk, we use Stirling numbers of the second kind.

Define P(x)=\sum_{i=0}^kx^i\sum\binom Si, Q(a,b,x)= \sum_{i=0}^kx^i\sum_{j=a}^b\binom ji. We can see if we add a new worker (a,b), new P'(x)=P(x)\cdot Q(a,b,x). That means we only need to get \prod Q^{c_i}(a_i,b_i,x). Every single Q can be calculated directly because \sum_{j=a}^b\binom ji=\binom {b+1}{i+1}-\binom a{i+1}.

Finally, we can use FFT and divide and conquer to get O(nk\log n\log k) solution. As described in the editorial, we can also use \ln and \exp to get faster but harder to implement O(nk\log nk) solution.

Most contestants’ (first) solution in a simpler way:

Define polynomial P_2(x)=\sum_{i=0}^k\frac{x^i}{i!}\sum S^i and Q_2(x)=\sum_{i=0}^k\frac{x^i}{i!}\sum_{j=a}^bj^i. To get Q efficiently, we use Bernoulli Numbers. Also, we can see if we add a new worker (a,b), new P_2'(x)=P_2(x)\cdot Q_2(a,b,x). And the rest is exactly the same as the first solution.

To calculate Stirling numbers efficiently, we use the explicit formula of the numbers and FFT. See author’s code.

To calculate Bernoulli numbers, we use polynomial inverses, because the exponential generating function of Bernoulli numbers is \frac x{e^x-1}.

Things described above may come differently as in the editorial, but they lead to the same formulas.

2 Likes
QUICK EXPLANATION:
Bring all stuff you know about combinatorics and probability theory, add some stuff you don't know, blend it in mixer until you get the solution.
*scrolls down to see the mathematical formulas*

…
…

*dies from Cancermathics*