PROBLEM LINK:
Setter- Bogdan Ciobanu
Tester- Misha Chorniy
Editorialist- Abhishek Pandey
DIFFICULTY:
MEDIUM-HARD
PRE-REQUISITES:
Segment Tree with Lazy Propagation , Taylor Series , Logarithmic Functions, Basic Calculus
PROBLEM:
There are N food stalls. After eating food from food stall i, there is a P_i probability of you getting food poisoning initially. We have to support 2 queries-
- 0 L R - What are chances of you not getting infected with food poisoning if you eat from all stalls in range [L,R].
- 1 L R T - Each of the food stall in range [L,R] has improved hygiene, leading to chances of food poisoning getting reduced by a factor of T.
QUICK EXPLANATION:
What we have to actually find is, (1-P_L)*(1-P_{L+1})...*(1-P_R), which itself is easy, but the update operation makes it complicated. We use the property of Log which says {e}^{Log_e a}=a to express Ans=(1-P_L)*(1-P_{L+1})...*(1-P_R)={e}^{Log_e [(1-P_L)*(1-P_{L+1})...*(1-P_R)]} ={e}^{Log_e(1-P_L)+Log_e(1-P_{L+1})...+Log_e(1-P_R)} . Now, each of the Log_e(1-P_i) can be easily calculated by using its Taylor series, which says that Log_e(1-x)=-(x+\frac{{x}^{2}}{2}+\frac{{x}^{3}}{3}.....\infty) . With this, we can easily support update operations using Lazy Propagation.
EXPLANATION:
The setter, tester and editorialist have majorly used the same approach. Hence, that will be discussed in this editorial, along with few common mistakes, setter’s note of precision of Taylor’s series in the bonus section.
Setter/Tester/Editorialist’s Solution-
1. What to store in Node?
For our approaches, we decided to store the first 100 terms of Taylor Series. Because 0\le P_i \le 0.9, then the 100'th term of the series, \Large \frac{{x}^{100}}{100}, will be at most of magnitude 2.6*{10}^{-7} and will have an effect of \approx {10}^{-7} on the final answer. Hence, taking first 100 terms should do. We didnt experiment for the exact amount, you need at least 88-90 terms, as any less will definitely not do.
2. How to handle updates?
For convenience, let me denote P_i by x from henceforth.
Note that we are storing \Large \sum_{i=1}^{100} \frac{{x}^{i}}{i} in the node. For each update, all we have to do is to change \Large \sum_{i=1}^{100} \frac{{x}^{i}}{i} to \Large \sum_{i=1}^{100} \frac{{(t*x)}^{i}}{i}. This can be easily done by multiplying \Large \frac{{x}^{i}}{i}*{t}^{i}. A single for loop suffices for that, and we can update a node after reaching to it in 100 operations.
We must make sure to use Lazy propagation here, so that the latter part of tree, which may not be needed right now, is updated efficiently only when needed. If we keep on updating the entire tree every time (i.e. if we dont use lazy propagation) then our update will take 100*NlogN (worst-case) operations, which will time out!
What is the Parent-Child relationship?
The parent child relationship for this tree is actually simple!! Say Taylor series of left child is \Large \sum_{i=1}^{100} \frac{{x}^{i}}{i} and that of right child is \Large \sum_{i=1}^{100} \frac{{y}^{i}}{i}. Then Taylor series of parent is given by \Large \sum_{i=1}^{100} (\frac{{x}^{i}}{i}+\frac{{y}^{i}}{i}). Can you think a bit on why is this so? (Hint: Log_e(ab)=Log_ea+Log_eb). Proof is in Chef Vijju’s corner.
4. How to get the answer-
For getting the answer, we query for the nodes in range [L,R] to get the sum of Taylor series of each of them. Lets call this as Sum. Note that this Sum represents nothing but Log_e(1-P_L)+Log_e(1-P_{L+1})...Log_e(1-P_R)=Log_e [(1-P_L)*(1-P_{L+1})...*(1-P_R)]. Our answer, hence, is {e}^{-sum} by using the {e}^{Log_e a}=a property of Log.
For this, we used the expl() function of C++ because of its high accuracy. You can refer to my commented solution for implementation (I tried my best to explain tester’s solution there as well, so do give it a shot )
And thats it, we are done with one of the hardest problems of this long . Were you expecting a 3000 word editorial here? Sorry xD
SOLUTION:
Author’s solution can be found here.
Tester’s solution can be found here.
Editorialist’s solution can be found here.
Time Complexity- O(K*(N+Q)LogN) where K=100 (Number of terms of Taylor Series)
CHEF VIJJU’S CORNER
1. Dont open the tab below! Please no! Its empty, really!!
Click to view
**2.**Proof of why \Large \sum_{i=1}^{\infty} \frac{{x}^{i}}{i} =-Log_e(1-x)
Click to view
\Large f(x)=\sum_{i=1}^{\infty} \frac{{x}^{i}}{i}
Differentiating both sides w.r.t. x-
\Large f'(x)=\sum_{i=1}^{\infty} {x}^{i-1}=1+x+{x}^{2}+{x}^{3}...
\because Sum of an Infinite Geometric Progression, starting with a with common ratio |r|<1 is \Large \frac{a}{1-r}
\Large \implies f'(x)=\frac{1}{1-x}
Integrating both sides w.r.t. x-
\Large \int{f'(x)}=\int{\frac{1}{1-x}}
\Large \implies f(x)=-Log(1-x)
3. Reverse proof of how we correlated Ans=(1-P_L)*(1-P_{L+1})...*(1-P_R)={e}^{- \sum_{L}^{R}Log(1-P_i)} and how we arrived at the Taylor Series is clearly outlined in the quick explanation section. Please refer there for derivation :).
**4.**Setter’s Note of precision of Taylor’s series method-
Click to view
Let \delta(x) be the signed relative error precision when representing the real number x as a float, i.e. \delta(x) = \frac{(x - float(x)}{x}
\implies float(x) = x * (1 + \delta(x))
In the leaves of the SGT, we will store a list (of terms of Taylor Series) of size P, where P is our desired precision for the Taylor series. This list will be of form [P_i, {P_i} ^ {2}, .., {P_i} ^ P].
\delta(x * y) \approx \delta(x) + \delta(y), so {P_i} ^{K} will have K * \delta(P_i) relative error.
This is alright, because P is around 100.
Afterwards, we’re going to go up the tree and start building it.
In every node we store the sum of the son’s lists, i.e. if son-left’s list is [A_1, A_2, .., A_P] and son_right’s list is [B_1, B_2, .., B_P], then the list in node is [A_1 + B_1, A_2 + B_2, .., A_P + B_P].
\delta(x + y) = \Large \frac{(x * \delta(x) + y * delta(y))}{(x + y)}
We can then show inductively that every component of these lists is bound by the same relative error as the leaves.
When multiply by Q on an interval, the same reasoning applies, it’s K'th iteration will have a relative error of K * \delta(Q). Afterwards we push this value on O(logN) nodes.
So the worst case precision relative error is \delta * (Q * logN + N) * P, which is about {10}^{-9}.
Now, that was the Segment Tree and building / updating, we’re left with the query: the analysis for the relative error introduced by the summation of the lists in which the [L, R] interval of the query is broken into is similar to the one for building the tree.
For computing the term Q= -x - \frac{{x}^{2}}{2} - \frac{{x}^{3}}{3}..., divisions won’t introduce any addition relative error, because \delta(x / y) \approx \delta(x) - \delta(y) and 1, 2, 3, .. are exactly representable as floats.
Computing {e}^{q} using exp, going by the documentation, we know that it’s rounded well.
In my opinion, the following reasoning applies:
{e}^{x} * (1 + \delta({e}^{x})) = {e}^{(x + x*\delta(x))}
\implies (\epsilon = x*\delta(x)) {e}^{(x + \epsilon)} {e}^{(x + \epsilon)} = {e}^{x} * (1 + \epsilon) + O({\epsilon}^{2} * const)
**5.**Proof of Parent-Child Relationship
Click to view
ParentNode=\Large \sum_{i=1}^{\infty} (\frac{{x}^{i}}{i}+\frac{{y}^{i}}{i})
=Log_e(1-x)+Log_e(1-y)
=Log_e(1-x)*(1-y) [as Log_e(ab)=Log_ea +Log_eb]
Which is the answer which must be stored at parent node. Hence the relation is correct.
**6.**Common errors-
- We are already maintaining the precision using double. Do not use long double unnecessarily, as it takes more memory and a lot more time in computation. Solutions with long double everywhere can still take as long as 5-10secs to run under given constraints.
- Creating a new node is expensive! Save time by already creating a set of nodes, and using reference to those instead of creating new ones. Carefully see the use of \& operator in setter and tester’s code. What we do is, we pre-create a set of LogN nodes, and whenever we feel need to create a node, we use one of the already created ones. This saves time and memory because if we dont do this, each recursion on query will create at least one node, and each node stores a series of length 100.
- Improper and incorrect lazy propagation was seen. In cases, some didnt use any.
**7.**Some resources for Segment tree with lazy propagation-