Surprisingly wrong recursive function (involving merge of persistent treap)

karolis_kusas · November 15, 2018, 10:07pm

Hello, guys,

I was solving https://www.codechef.com/problems/GENETICS. It involves a persistent treap. I implemented it, but I needed much time for debugging. I still do not understand why there was an issue with my following code, so could anyone point it out?

Let’s take the treap merge:

void merge (int &t, int l, int r) {
  if (!l && !r) { t = 0; return; }
  if (!l || !r) t = l? l: r;
  else {
    createNew(); t = getLst();
    if (T[l].prior > T[r].prior) {
      T[t] = T[l];
      merge (T[t].r, T[l].r, r);
    } else {
      T[t] = T[r];
      merge (T[t].l, l, T[r].l);
    }
  }
  upd_cnt (t);
}

t is the new treap, l and r are the treaps to be merged. createNew(); t = getLst(); just creates a new empty treap node. The issue is that passing T[t].r or T[t].l as int &t sometimes does not update their values. Does someone know why this happens? The solution was to use a global variable last for the updated treap node. Then this merge works well.

Thanks!

vijju123 · November 15, 2018, 10:48pm

Do you have the entire code with you (I see you have a modified merge function in your submissions)? Since its C++, I can bet a good deal that there would be minor runtime error in your code which went to undefined behavior causing this. Was it not running even on the sample test cases?

karolis_kusas · November 15, 2018, 11:25pm

I restored the code there: https://pastebin.com/8zzC2gEu
To see the effect one could use the sample given in the problem statement and run it many times on https://www.codechef.com/ide
The other functions (Modify and split also have a similar issue). I have got AC computing this stuff globally: https://www.codechef.com/viewsolution/21627974

vijju123 · November 16, 2018, 12:24am

Thanks, I will have a look at this

vijju123 · November 16, 2018, 2:00am

Hey @karolis_kusas , can you tell me about the field “prior” you are using in items?

Like, you took input, then immediately called merge, when prior had random value assigned to it. Will then the results not vary by time? I think I am missing something here.

And I tried running your code on various compilers. Codechef, afaik, uses Linux, so I ran it a couple of times on CF (Windows I guess?) and couldnt get wrong output even after 20 times. Its a sign of some undefined behavior- looks sneaky though, or a very bad luck of mine :/.

meooow · November 16, 2018, 3:43am

Wow that was one tricky bug… and where you would hardly expect one. This output should make things clearer: ideone. The problem is that you are using a vector<item> as a pool and using push_back() to add new items to it. Occasionally it happens that you send T[t].l or T[t].r by reference to a recursive call, but before the call returns new items being added to the vector causes it to reallocate its internal storage and copy everything, invalidating the references that you sent.

afaxnraner · November 16, 2018, 4:06am

@meooow already explained the error.

The same thing happened to me a few days ago (also while doing persistent things).
I sent a minimal example to a friend, who claimed that finding a “heap-use-after-free” error is easy. (This is the error you get when you compile with sanitizers.) And then he failed miserably to find the error in the short example: https://ideone.com/yQ1ON2

karolis_kusas · November 16, 2018, 5:46am

prior is a short name for priority Yep, it is true that I assign random values there, but I also copy most of the time, so they should not change over time. Thanks for helping me, vijju123. I have already got my question answered by meooow.

karolis_kusas · November 16, 2018, 5:49am

Thanks! Would not have thought about this

mgch · November 16, 2018, 3:21pm

For persistent treaps it’s better to don’t use priorities at all.

I’m using if (rand() % (T[l].sz + T[r].sz) < T[l].sz) {
Join(l, r);
} else Join(r, l);

meooow · November 17, 2018, 1:32am

@mgch any reason to prefer rand() % (T[l].sz + T[r].sz) < T[l].sz rather than just rand() % 2? I would expect the latter to be closer to actually having random priorities, or am I wrong?

EDIT: After attempting to solve the problem PRESTIGE with all 3 variations, I find that using priorities is better than rand() % (T[l].sz + T[r].sz) < T[l].sz is much better than rand() % 2. Of course this not a thorough testing. However I would appreciate an explanation for the difference in performance between the two rand() methods, thanks!