 # RDF - Editorial

EASY

### PREREQUISITES

Simple Math, Dynamic Programming

### PROBLEM

You are given a function random(N), that returns a uniform random integer between 0 and N-1.

You are also given a function RDF defined as follows

```RDF(N,K)
repeat K times
N = random(N)
return N
```

Now, given N and K, find the expected value of RDF(N,K).

### QUICK EXPLANATION

Let F(N,K) be the expected value of RDF(N,K).

From the definition of RDF, we can derive the following

• F(N,0) = N
• F(N,K) = ( F(0,K-1) + F(1,K-1) … + F(N-1,K-1) ) / N

Thus, we can write a quick DP to solve the problem

```F = Array [0 ... 99999] [0 ... 99999]
for i = 0 to 99999
F[i,0] = i

for j = 1 to K
sum = 0
for i = 1 to N
sum += F[i-1,j-1]
F[i,j] = sum / i
```

Of course this solution will not work. There is neither enough memory and nor enough time. But, if you tried to implement this approach for smaller values of K, you would get an interesting insight.

The value of DP[n,k] falls exponentially as k rises.

In fact, the value of DP[n,k] is well below the required level of precision, when k > 35. Now, you can run the above for

``````DP = Array [0 ... 99999] [0 ... 36]
``````

and then print the values from the DP table if K ≤ 36. For all other cases you can print 0.

### EXPLANATION

Let us prove this formally. [ Thanks to Anton ]

We will prove by induction that

`F(N,K) <= N / 2``K`

## Base Case

``````F(N,0) <= N
``````

This is given.

## Induction

```F(N,K+1) =
( F(0,K) + F(1,K) ... + F(N-1,K) ) / N

Since F(j, K) <= j / 2K for j = 0, 1, ..., N-1

F(N,K+1) <=
( 0/2K + 1/2K ... + (N-1)/2K ) / N
= ( 0 + 1 ... + N-1 ) / N / 2K
= N*(N-1)/2 / N / 2K
= (N-1) / 2K+1
<= N / 2K+1
```

Hence Proved.

## Consequence

`F(N,K) <= N / 2``K`

• F(N,K) increases as N increases.
• F(N,K) decreases as K increases.

We can only increase N up to 99999. At N = 99999, let us find the value of K, beyond which F(99999,K) will be less than our precision goal.

```99999 / 2K < 10-6
2K > 1011
K > log2 1011

or, K > 36.54
```

Thus, at K = 37, already all the values of F(N,K) are less than 10-6.

We can simply calculate the DP till K = 36. Print the values from the DP table if K36, or 0 otherwise.

### SETTER’S SOLUTION

Can be found here.

### TESTER’S SOLUTION

Can be found here.

18 Likes

Great Editorial…how to formulate the recurrence relation for DP problems??

I liked the PROOF part. Which describes about reduction factor! Thanks I’d like to share my “lovestory” with that problem At first I have to describe my emotional background - I solved first 3 problems first day of the contest, so I felt strong When I was only a beginner in contest programming, Mimino and Delsius told me that constraints matter. Simply told: “use the easiest solutions that fits contraints”, so I started there (as usually)…

N is up to 10^5 and is up to 10^5 too => O(N*K) solution for one test is too slow, maybe if I can solve one value in O(log(K)*N) or O(log(N)*K) it will be ok, but for T up to 5*10^5 it is slow too => we have to return result for one test case in O(1)… That’s not a good message - only the correct approach can be used (as oposite, typically there is let say O(n) solution, but O(log(n)*n) fits too, but not here…).

It’s clear from statement, that RDF(N,K) = 1/N * sum( RDF(i, K-1), 0, N-1). I have no idea why coders asked, but it’s crytal clear also that RDF(N, 0) = N (no change in loop). Ok, so now we have brute force approach in O(T*N*K*N). Probably end of the universe comes before this algorithm returns value… In fact it’s not so bad, if we can memoize the values for all (N,K) pairs, we can produce answer in O(T+N*K*N) - better, but not good enough. I have to do something with theO(N*K*N) part.

I spent a lot of time trying to find some formula for this. Meanwhile I found some properties of that function: `RDF(N, K) = 0` for N<=K and `RDF(N, N-1) = 1/N * RDF(N-1,N-2)`. Closer to solution? In fact no, no inch closer…

We (me and RDF function) spent a lot of time together. We built a relationship, but it wasn’t friendship, it was master and slave relationship :-/

During that time I tried to break the formula from each side and somewhere in this process I found that

since `RDF(N,0) = 0`

``````RDF(N,1) = 1/N * sum(i) = 1/N * (N-1)*N/2 = (N-1)/2
``````

and

``````RDF(N,2) = 1/N * sum((i-2)/2))=1/2N * sum(i-2)
``````

but nothing nice for K=3, bad luck ``````RDF(N,K) = 1/N * sum( RDF(i, K-1), 0, N-1 ) )
RDF(N,K) = 1/N * sum( RDF(i, K-1), 0, N-2 ) ) + 1/N * RDF(N-1, K-1)
``````

and `sum( RDF(i, K-1), 0, N-2 ) )` part was familiar to me… Like de ju vu…

So I tried `RDF(N-1,K)` and it holds

``````RDF(N-1,K) = 1/(N-1) * sum( RDF(i, K-1), 0, N-2 ) )
``````

yeah, this is the familiar part, so

``````(N-1) * RDF(N-1,K) = sum( RDF(i, K-1), 0, N-2 ) )
``````

and returning back to formula we have

``````RDF(N,K) = (N-1)/N * RDF(N-1,K) + 1/N * RDF(N-1, K-1)
``````

fine I reduced O(N^2) to O(N) in O(N*K*N), so actually we have O(T+N*K). If N and K <= 20000, maybe I’d gave it a chance, but for N and K up to 10^5 I have no chance to calculate even memoize that in time and space limit…

Now I thought it is matrix exponentiation problem… If so, we cannot memoize all the results, but we will have something like O(T*log(N)) or O(T*log(K))…

On the 4th day I read the statement once again. I was wondering why there is such formal definition of error while it’s very common thing when result is double. No lead here. So I finally decided to explore the absolute values when I have no further ideas for those relative ones…

If we have fixed K we now, that all values of RDF(i,K) = 0 for i in [0,N], from formula above we know, that `RDF(K+1,K) = 1/(K+1)!`. Every contest programmer have to know that 10! > 3milions, so, for K=9 the first value is lower than 1e-6. Ok, but next value in row is bigger .The remaining question was what’s the last value in row, but it was easy to find using that formula, and also to calculate it for first 50 rows in timelimit.

8 Likes

Yeah it was making me nuts too. It is one to remember. Nice work by the setter on the constraints, clever move. And thanks for sharing your story, I bet a lot of other people have suffered at the hands of this problem too.

Plz elaborate how we got the recurrence
F(N,K) = ( F(0,K-1) + F(1,K-1) … + F(N-1,K-1) ) / N

1 Like

Look: When we have N and K on some step of RDF, then the result after k operations will become F[0,K-1] with probability 1/N, F[1,K-1] with the same probability,…, F[N-1,K-1] with probability 1/N. So we can put 1/N out of the brackets and get recurrence F(N,K) = ( F(0,K-1) + F(1,K-1) … + F(N-1,K-1) ) / N. Hope it will help you, feel free to ask if you still need more explanations.

I am so excited with your story When the idea of the problem came into my head, I first thought that the result will be exactly N/2^K. I was a little bit upset after writing a dp solution, since results were even less than I expected it to see. So i knew that F(N,K)<N/2^K and it wasn’t tricky to count the answer only for small K. We should thank Anton for his neat proof of these fact. And of course, thank you, for solving our problems and getting into such special relationships with them 1 Like

@shinigamiryuk: you have to know something about “expected value”

…is the weighted average of all possible values that this random variable can take on.

and from problem statement we know, that for K > 0

random(N) returns any integer in the range [0, N) with equal probability

guys can you tell what’s wrong with my submission? it shows TLE, code is easy readable http://www.codechef.com/viewsolution/1939322

Don’t use cout. The output file is more than 5MB.

Thanks Anton 1 Like

Hi, I know about expected value but still I’m missing this equation
F(N, K) = (F(0,K-1) + F(1,K-1) + … + F(N-1,K-1)) /N
please tell me the reasoning behind this equation.

//