CHEFPAR- Editorial

vijju123 · April 16, 2018, 4:28pm

PROBLEM LINK:

Div1
Div2
Practice

Setter- Misha Chorniy
Tester- Misha Chorniy
Editorialist- Abhishek Pandey

DIFFICULTY:

CHALLENGE

PRE-REQUISITES:

Varying. Challenge problem is usually an application of your skills in competitive programming in general.

PROBLEM:

Given an array A[] of randomly generated sequences, we have to add some integer D_i (need not be distinct) to each array element A_i, where 0\le D_i\le K. Our goal is to maximize \frac{1}{M}\sum_{i=1}^{M} B_i where B_i=(A_1*A_2...*A_N)\%P_i

QUICK ANALYSIS:

It seemed that contestants faced problems in getting a good solution. We’re concluding that, because, Some of the trivial solutions got too high points than what they should have got. Majority of the contestants simply printed back the input, or used input+rand()\%K &etc.

ANALYSIS:

The first thing I want to say is that, this editorial is incomplete. And it will remain so, until you guys dont contribute! Its impossible to have any hard and fast approach for the challenge problems, and for the editorial to be at its full potential, I want to request you guys to discuss your approach. Benefit in challenge problem is better gained by discussion, seeing flaws of your approach and analyzing other’s strengths- and seeing what worked well, and to what degree. Hence, I would request the community to put forward their approach and intuition behind the question :).

As for editorial, I will try to discuss some of the approaches I saw, in both div1 and div2. I hope you people like it

1. Div2-

Not even 10 minutes passed from start of contest on the historical date of 6th April,2018 when Div2 had got the first few accepted solutions. I had a guess in mind, and I,curious as a doomed cat, decided to see what they did and confirm my intuition. And I dont know if it was fate, or destiny, or perhaps something else, but what I saw was an exact picture of what I had in my mind…

cin>> arr[i]; //Take array input.
cout<< arr[i];//Print it back.

This approach got something \approx 85-88 points. It was 88.7 when I checked last on 14th.
Further solutions were also on similar lines. Eg-

cin>>arr[i];
cout<< arr[i]+rand()%k;

This one got 85.8 points then. Sad luck for that guy

cin>>arr[i];
cout<< arr[i]+k;

This solution performed better, and got around$88-89$ points on average.

Some of the better solutions at div2 which got \ge90 involved-

Choose one of the primes. Lets call it P Either largest, smallest, middle one or randomly any.
Make array ideal according to that prime, i.e. add D_i so that (A_1*A_2..*A_N)\%P is maximum.
Pray that this solution gets better points.

By roughly around 20-25 submissions, people experimented with what prime to take. Most of them settled on the median prime.

A good number of approaches used simulation and storing array and its result. Eg-

cin>>arr[i];
Store arr[i]+rand()%k;//Store in a vector etc.
Compute score for the just stored array.
Repeat above 250-400 times.
Print the configuration with maximum score

Depending on luck and number of simulation, the above approach fetched 88-94.7 points. I saw quite a few with 94.7 points.

Some of the top approaches also use the concept of simulating till just about to time-out. The contestants chosed a distribution (random, or some other) which they simulated for \approx3.8-3.95 seconds where they sought to see which choice of D_i is increasing score for a particular A_i. When about to time out, they aborted the process and printed the output they got.

2. Div1

The performance of Div1 was kind of similar to Div2 xD. One of the codes which got 91.1 points at pretest was-

cin>>A[i];
cout<< A[i]+K/2;

Most of the approaches were common- like simulation till best answer, or take rand() values 300-400 times. Omitting the common approaches, the approaches of top solutions were quite distinct. (However, most of them are hard to decipher due to 300+ lines of code).

Some crucial optimizations were, however, seen. For example, lets say I got some values of D_1,D_2...D_N and calculated the value of Score=(A_1+D_1)*(A_2+D_2)*...*(A_N+D_N)\%P_i. The top solutions preferred to change D_i one by one, and re-calculate Score in O(Log(A_i+D_i)) by using inverses as - NewScore=({A_i+D_{old}})^{-1}*Score*(A_i+D_{new})\%P_i. This method got 95+ points on pretest.

Some of the good top codes deserve a mention here. These codes are what one can call crisp

As usual, I want to invite everybody (yes, everybody, not just the top scorers) to discuss their approaches so that we can have an engaging and learning insights into various intuitions

CHEF VIJJU’S CORNER:

1. The very first solution submitted by tester to test this problem was also simply printing back the input xD. After that as many as 16 more submissions were made.

**2.**Lets discuss the setter’s approach here. Lets take a random subset (serveral times) of array P[] and multiply the primes in this subset. Lets call their multiple MUL. Now, we now that MUL\%P_i=0 where P_i is a prime in the subset.Now, our aim is to get the value of (A_1+D_1)*(A_2+D_2)....*(A_N+D_N) closer to (preferably exactly equal to) MUL-x. This is because, by applying \% operation for various P_i, we will be left with MUL\%P_i-x\%P_i=(-x)\%P_i. We can go greedily and try to factorize MUL-1,MUL-2,...,MUL-x (The exact value of x can be experimented upon). The factorization will help us in manipulation of D_i values.

One thing to check is that, (A_1+K)*(A_2+K)..*(A_N+K) must have a value more than MUL else the above scenario may not be possible. We can check this by using log function, i.e. by changing expression (A_1+K)*(A_2+K)..*(A_N+K)
to-
Log(A_1+K)+Log(A_2+K)+...+Log(A_N+K).

Similarly, MUL can be expressed as (Log(P_1) +Log(P_2)+....+Log(P_N))

You can find good practice problem here

**3.**There is also a better random generator in C++ known as Mersenne Twister or mt19937. It is available from C++11 and later. Some of the advantages it has over rand() are that-

mt19937 has much longer period than
that of rand. This means that it will
take its random sequence much longer
to repeat itself again.
It much better statistical behavior.
Several different random number generator engines can be initiated
simultaneously with different seed,
compared with the single “global”
seed srand() provides

alexthelemon · April 19, 2018, 2:56am

If anyone is interested I could write something about getting a good score on this problem.

But would it be possible for the practice submission mechanism to be fixed? At the moment it is always returning a score of 0 (and a time of 0), regardless of your actual score. (See https://www.codechef.com/status/CHEFPAR .)

aryanc403 · April 19, 2018, 3:01am

@alexthelemon Thanks in advance for writing.

As usual, I want to invite everybody (yes, everybody, not just the top scorers) to discuss their approaches so that we can have an engaging and learning insights into various intuitions

*** -vijju123***

vijju123 · April 19, 2018, 3:10am

I will ping @admin about this.

admin · April 19, 2018, 5:48am

Hi, sorry about that. It’s fixed now.

And yes, you are more than encouraged to write about your approaches to the challenge problem (or any problem, for that matter). We’ll link to it from the editorial.

alexthelemon · April 19, 2018, 6:07am

Thanks. Though you submitted my code under your name. Could you possibly rename it as mine?

aryanc403 · April 19, 2018, 6:20am

All previous submissions still show score of 0.
On submission page name writen admin2(1 star). On opening profile opened is of @admin . When opening profiles of @admin @admin2 no submissions are seen.
And if possibly renaming cannot be done. Then if you want @alexthelemon admin can delete his own submission.

admin · April 19, 2018, 9:15am

Sorry, I just took the code right before that and submitted for testing purposes. Will remove it.

vijju123 · April 19, 2018, 4:08pm

Hmm… I see. Even @admin is not allowed to plagiarize codes. Good job xD

BTW, please also mention some outlines of your approach @alexthelemon , like, what the code is doing, whats your intuition etc. Anything which can help people understand what you want to convey

alexthelemon · April 19, 2018, 11:02pm

Thanks admin!

alexthelemon · April 27, 2018, 5:05am

(Though actually your submission is still there. I would prefer if it didn’t look like I copied your submission.)

alexthelemon · April 27, 2018, 5:53am

I would like to write an editorial, but I can’t see how to submit one. There were no buttons on the discussion page ( https://discuss.codechef.com/ ) that looked like they would create a new question. Should I just submit an “answer” on this page?

vijju123 · April 27, 2018, 4:07pm

Yes please. As an answer here. This allows us to compile everyone’s approach at a single page here (else people will have to search, browse and hunt for those- not preferable )

alexthelemon · May 4, 2018, 8:59am

Problem description:

Given an integer K and integers A_i,\ldots, A_N, and primes P_1,\ldots, P_M, find
D_1,\ldots,D_N satisfying 0\le D_i\le K, such that \frac{1}{M}\sum_{j=1}^M B_j is as large as possible, where
B_j=\prod_{i=1}^N (A_i+D_i) \mod P_j.

We are given four different parameter sets. In each case P_1,\ldots, P_M are consecutive primes with P_1 between 1.1\times10^9 and 1.5\times10^9, and 1\le A_i\le 10^9 are independently random.

The four parameter types are (1) N=100, M=10000, 1\le K\le1000000, (2) N=1000, M=1000, 1\le K\le1000000, (3) N=1000, M=1000, 1\le K\le1000, (4) N=10000, M=100, 1\le K\le1000000.

First attempt

In general this problem is a bit awkward because we want to fix an additive property of the
$B_j$s (like being in the range 0.9P_j to P_j-1), but if we vary one of the D_i, it
changes all of the B_j multiplicatively, which more-or-less randomises B_j
additively. Changing a single D_i makes a huge change to the product of
(A_1+D_1)\ldots(A_N+D_N), and reducing a huge number (approximately 31N bits) modulo
P_j (approx 31 bits) is a “chaotic” operation.

So I believe that to first approximation this “machine” that we are given that generates
B_j behaves as a random number generator and we are being asked to choose a seed (the
initial A_i+D_i) to maximise the sum of uniform random numbers. This means that “hill
climbing” from one solution to another by small changes in the D_i won’t be any better
than starting afresh with a new configuration of $D_i$s.

Looking at it like this allows you to write a simple program that would end up in about 5th position in the combined final Div1+Div2 ranklist. Example program here.

This first attempt just generates instances of B_1,\ldots,B_M as fast as possible, and takes the best score it finds. You can do this moderately fast by fixing D_2,\ldots,D_N and making k=D_1 range over 0,\ldots,K. That way B_j is of the form AA_{0,j}+k.AA_{1,j} where AA_{i,j}=\prod_{i'=i+1}^N (A_{i'}+D_{i'}) \mod P_j (and D_1=0 in this product). If we invert the order of the inner two loops (over 1\le j\le M and 0\le k\le K) then we don’t need to do a multiplication or full modulo reduction. The inner loops look like (replacing K with K+1):

val=[0]*K
for j in range(M):
  p=P[j]
  b=AA[0][j];c=AA[1][j]
  for k in range(K):
    val[k]+=b
    b+=c
    if b>=p: b-=p

This inner loop goes at about 160\times10^6 loop iterations per second under pypy, and about 620\times10^6 iterations per second under C++ (measured on my desktop, which seems to be similar to CodeChef’s servers).

The factor of 5 that CodeChef allows Python over C++ seems to be slightly favour pypy here since the measured speed ratio is about 4, but this pypy speed is a bit fragile and depends on using integers in a simple way: the C++ version admits optimisations that wouldn’t work well under pypy, so I think the factor of 5 is reasonably fair here. In particular, oleg_b’s C++ program cleverly reorders the modular reduction in a way that avoids unpredictable branches and hops through the k loop in steps of about 4. This manages a hefty 2350\times10^6 loop iterations per second.

Anyway I prefer writing in Python, so I’ll stick with that here.

Analysis

Let’s rescale and pretend that we are adding up M independent uniform real random numbers in the range -1 to 1. The normal approximation works pretty well here, even at the tail. The variance of U[-1,1] is 1/3 and in the allowed time (20s for Python) we can generate about 166\times10^6\times20/M instances, so to work out the largest sum we might manage we take the M/(20.166\times10^6) point of the of tail of the normal distribution:

from scipy.stats import norm
for M in [10000,1000,100]: print M,-norm.ppf(M/(166e6*20))*(M/3.)**.5

which gives (unnormalised) scores of about 261, 91, and 31 for M equal to 10000, 1000, 100. The rules of the contest meant that
we had to divide these sums by M to get the actual scores (rather than \sqrt{M} which would equalise the variance),
which means that type 4 (M=100) is much more important to optimise than type 1 (M=10000), because the possible relative changes (after dividing by M) are about 10 times greater for type 4.

Notice that 31 in the above scale has the interpretation as the number of “top” values of B_j. Scoring a sum of 31 is equivalent (on average) to fixing 31 of the $B_j$s equal to their maximum value of P_j-1 and leaving the other $B_j$s as random.

It is relevant that this number, 31, is quite big. If instead of this random brute force we try to do something clever that fixes some of the B_j, then it’s no use just fixing, say, 15 of them if that’s all we do, because that will only score 15 on this scale and so will lose out to the sampling as above. We need to be able to fix some of the B_j in such a way that it is fast to try lots of random variations, and that is harder to do (but possible - see below).

Second attempt

If you imagine doing the inner loops in the other order, with the loop over primes in the inside, then it’s clear the process is wasteful. After you add up some of the $B_j$s it might that the total isn’t very high, and so it’s unlikely that the total for all of the M $B_j$s is to be exceptionally high, and we only care about exceptionally high totals. In such a situation it would pay to abandon the current instance midway and start a new one.

So the inner two loops should look something like this

for k in range(K):
  v=0
  for j in range(M):
    if v < thr[j]: break
    p=P[j];b=AA[0][j];c=AA[1][j]
    v+=(b+k*c)%p

where thr[j] are some suitably-chosen thresholds. Unfortunately we more-or-less have to
write the loops this way around (K then M) to use this method and can’t take that much
advantage of the fact that k increments by 1, which means we take an efficiency hit
before we can realise the gains. But the gains are quite substantial, so we still win
out. There is a question of how to measure the effectiveness this method. It’s no longer
possible just to count inner loop iterations per second, because they are not all
equal. (Reaching large j is better than small j.) It’s also too random to measure how
well it does in terms of its final answer - that way it would take hours to collect
reliable statistics. One solution is to decide on a benchmark score, some quite rare
level, but not too rare, and count how many times this is achieved. For the discussion
here we can take the benchmark level to be
B=\frac{1}{2}MP(1+\left(\frac{2\log(10^6/M)}{3M}\right)^\frac{1}{2}), derived from the normal approximation mentioned above, where P
is a typical prime in the sequence P_1,\ldots,P_M (they are all very close).

The optimal threshold curve, thr[j], can be worked out from a standard Gittins Index calculation, but the formula
thr[j]=\frac{1}{2}(j-1-65/K)P+(1-(1-j/M)^\frac{1}{2})(B-\frac{1}{2}MP) seems to approximate this reasonably well.
For type 4, the first attempt above achieves about 13 benchmarks per second (bm/s), and this second attempt with the above threshold curve achieves about 28 bm/s.

Here is an example implementation.

Continued fraction method

The previous methods effectively improve the constant factors: they still require \Theta(M) time to evaluate a configuration of $(A_i+D_i)$s
because on average the product is evaluated modulo some definite proportion of the M primes, say M/10 primes on average.

It’s possible to reduce this to (amortised time) O(\sqrt{M(1+M/K)}). This sounds good, though actually it won’t turn out to be enormously useful here because the parameters for this challenge are not so favourable for this method.

Go back to the loop order from the first method (K-loop on the inside). We’d like to
calculate the KM values B_j+kC_j \mod P_j for 1\le j\le M and 0\le k\lt K as fast as
possible.

For each j, pick the best rational approximation to C_j/P_j, say c_j/p_j, with denominator p_j at most some fixed constant m
to be optimised. c_j, p_j can be calculated effectively using a continued fraction. Then B_j+kC_j\mod P_j
is nearly periodic in k with period p_j. More exactly, if k_0+k_1.p_j (0\le k_0< p_j), then

B_j+kC_j = B_j+(k_0+k_1p_j)C_j = B_j+k_0C_j+k_1(p_jC_j-P_jc_j) = B_j+k_0C_j+k_1\Delta_j \mod P_j,

where \Delta_j=p_jC_j-P_jc_j.

For each j, we would like to store some information that allows us to calculate
B_j+k_0C_j+k_1\Delta_j \mod P_j for all k< K, but do so in a way that only takes
O(p_j+K/p_j) time, rather than O(K) time, and also do so in a way that can be summed
over all j. This is possible because \Delta_j/P_j is usually small, so the
k_0 and k_1 pieces of B_j+k_0C_j+k_1\Delta_j \mod P_j, which interact via the \mod P_j, don’t
interact too much.

Fixing k_0 for the moment, the effect of the \mod P_j term is to subtract off
multiples of P_j at intervals of k_1, and these intervals be large if \Delta_j is
small, which it usually is, so we can use a sparse array. Instead of looping over k_1,
we record the $k_1$s which will require an extra multiple of P_j to be subtracted. When
we process this in the next phase, we will have to take cumulative sums to recover the
relevant multiple of P_j. In symbols,

(B_j+k_0C_j+k_1\Delta_j)\%P_j = (B_j+k_0C_j)\%P_j+k_1\Delta_j - P_j\sum_{n\ge1}I(k_1\ge (nP_j-(B_j+k_0C_j)\%P_j)/\Delta_j),

so we subtract P_j from an adjustment array in positions (nP_j-(B_j+k_0C_j)\%P_j)/\Delta_j for n=1,2,\ldots, until
this position reaches K/p_j. This first phase code could look something like this (a slight variation of the above):

D=p*C-P*c
for k0 in range(p):
  vl[p][k0]+=B
  dl[p][k0]+=D
  if D>0:
    x=P-1-B
    while 1:
      q=(x//D)*p+k0
      if q>=K: break
      adj[p][q]-=P
      x+=P
  elif D<0:
    x=B
    while 1:
      q=(x//-D)*p+k0
      if q>=K: break
      adj[p][q]+=P
      x+=P
  B+=C
  if B>=P: B-=P

When this has been done for all 1\le j\le M, we can recover the sum we want, val[k]=\sum_{j=1}^M (B_j+kC_j)%P_j by
adding over the k_0-based arrays and accumulating over the adjustment arrays, like this:

for p in range(1,maxp):
  for k0 in range(p):
    v=vl[p][k0];D=dl[p][k0]
    for k in range(k0,K,p):
      val[k]+=v
      v+=D+adj[p][k]

There are about K/(P_jp_j/|\Delta_j|)+1 iterations in the inner loop of the first phase,
so about K|\Delta_j|/P_j+p_j steps altogether. Taking C_j/P_j to be a random number
in [0,1), the average value of |\Delta_j|/P_j=p_j|C_j/P_j-c_j/p_j| will be around
0.42/m and the average value of p_j around 0.47m, though the actual time constants
depend on the implementation. So the first phase might take arout M(K/m+m) or so steps.
The second phase takes K m steps, so the total time is around M(K/m+m)+Km=MK/m+(M+K)m.
The optimal value of m is \sqrt{MK/(M+K)} and overall it takes about \sqrt{MK(M+K)}
steps, which is a big saving over MK/10 (something like “second attempt” running time)
if M and K are both large.
This is an example implementation.

The main benefit of this method is for type 1 (M=10000) instances, but slightly
unfortunately for this method, the scoring system in this contest weights \sum_j B_j by
1/M (as opposed to 1/\sqrt{M} which would equalise the variances), so there is less
possible variation here. Also, this method doesn’t work well in conjunction with the
“second attempt” threshold-based speedups. Still, there is a large speedup benefit, even
if this doesn’t translate to a huge score gain: if K is moderately large
it manages about 400 bm/s for type 1 compared with about 42 bm/s from the “second attempt” above.

For type 4, it manages about 46 bm/s, which is better than the above, but we can do better
as we’ll see below.

Is there a way to “invert the random number generator”?

Is it possible to make some of the B_j equal to P_j-1 (the best value) or, for example,
force them to be greater than 0.9P_j by somehow solving for D_i in a way that doesn’t
require too big a search?

For example, it is suggested in the above editorial that you choose a set of N (out of
M) primes, say P_1,\ldots,P_N, and a smallish number x and try to factorise Z=P_1 P_2 \ldots P_N-x into (A_1+D_1)\ldots(A_N+D_N). The trouble with this is that it
requires (at a minimum) that the number Z doesn’t have any factors bigger than 1/N
times the size of Z. The chance of this happening is governed by the
Dickman function
and is roughly N^{-N}. In our examples,
N is at least 100, so it’s obviously out of the question to search for them one-by-one.

A variant of this would be to try to factorise \lambda.P_1\ldots P_r-x for some r
smaller than N, and some suitable multiple \lambda, but even though such choices of
\lambda and P_1,\ldots,P_r statistically should exist, it’s still no use trying to
factorise such a number head-on.

But there is something that can be done to give an edge over the direct searches mentioned
above, at least in the type 4 case.

The N-tree algorithm

We’d like to find a way of fixing some of the $B_j$s at good values (like P_j-1) while
also having the freedom to vary the D_i choice without affecting the fixed B_j. It’s
not enough to find a single example of B_1=P_1-1, B_2=P_2-1 if there were no
way of varying the other B_j other than rerunning the process that found the fixed B_j.
So the aim here is to find a collection of possible solutions that all have favourable $B_j$s
for some fixed set of j, and then to try each one to see which gives the highest total B_j
for the non-fixed j s.

Let’s stick with type 4 (M=100, N=10000, \max(K)=1000000) because more can be done
in this case.

Assume K\ge1000, which is usually true, and reduce K to (100P_1)^{1/4} (about 600). (The
exact number you should use here depends on how fast the program is going to run.)
Fix D_{17}, D_{18}, \ldots, D_N to some random values up to K.
Let R=\prod_{i=17}^N(A_i+D_i). We are going to use the birthday method.
Make a list of (A_1+D_1)(A_2+D_2)\mod P_1 for 0\le D_1,D_2\lt K, and
a second list of -(R(A_3+D_3)(A_4+D_4))^{-1}\mod P_1 for 0\le D_3,D_4\lt K. The lists contain K^2
elements each, and statistically you expect about K^4/P_1=100 of them to agree modulo P_1. These agreements
can be found in O(K^2) or so time, and each such pair corresponds to the relationship
(A_1+D_1)(A_2+D_2)(A_3+D_3)(A_4+D_4)R=-1\mod P_1. Do the same for D_5, D_6, D_7, D_8 except that instead
of -R, just use 1. So we end up with a hundred or so quadruples such that
(A_5+D_5)(A_6+D_6)(A_7+D_7)(A_8+D_8)=1\mod P_1. Similarly we get can 100 quadruples D_9, D_{10}, D_{11}, D_{12}
such that (A_9+D_9)(A_{10}+D_{10})(A_{11}+D_{11})(A_{12}+D_{12})=1\mod P_1 and 100 quadruples such that
(A_{13}+D_{13})(A_{14}+D_{14})(A_{15}+D_{15})(A_{16}+D_{16})=1\mod P_1.

Taken together, we have 100^4=10^8
combinations of D_1, \ldots, D_{16} such that \prod_{i=1}^N(A_i+D_i)=-1\mod P_1, and these can be checked
quickly, in the manner of the “second attempt” above, using only M modular multiplications to check each instance
(you precalculate the product of each quadruple modulo P_1).

This has given us a “free” maximum (meaning one of the $B_j$s is guaranteed to take its
maximum value), and so pushes out the distribution, resulting in about 43 bm/s for type 4,
compared with the previous best of 28 bm/s. Of course it’s not quite free because there is
a setup time, but using the above parameters this is only about 5% of the total time, so
there is plenty of time to find an extreme point of the remaining distribution.

This method can be pushed further, starting from 4^3=64 lists instead of 4^2=16, and
using two stages of “birthday-refinement” instead of one, which ends up fixing B_1=-1\mod P_1 and B_2=-1\mod P_2. Unfortunately, the setup time, though it just about fits within
the allotted time limit, doesn’t allow enough search time for the second phase, so it’s
not worth fixing more than one B_j using this method.

This was as far as I got in the competition, and this is my best answer
which implements a single fixed B_j.

But after the competition ended, I got to thinking that it should be possible to do
better… and of course it is. The trouble with the above method is that the lists of
allowable D_i need to get quite big to overcome the “cutdown” factor of P_j, and if
you want to do several stages to fix several B_j, the setup costs will be prohibitive.

Fortunately for many P_j, P_j-1 will split into several factors so the multiplicative
group, which is cyclic of order P_j-1, will admit a series of subgroups. This
potentially allows us to “cutdown” by smaller factors at each stage in the tree-like
refinement process, which means we need fewer items in the list of allowable values at
each stage, so we have a chance of making the setup phase run in a reasonable time while
fixing more $B_j$s.

For example, if we are fixing with respect to P=1400000023, then
P-1=2.3.3.41.263.7213, so one stage might be to fix with respect to the factor
7213. This means we have two lists, L_1, L_2 of numbers modulo P_1 and a target
product, t. We aim to make a single list of pairs, (x_1,x_2), from L_1\times L_2
such that x_1 x_2 t^{-1} is in the index 7213 subgroup of the multiplicative group. We
can do this using the birthday method by working in the quotient group: make a list of
(t.x_1^{-1})^r and x_2^r where r=(P_1-1)/7213. Elements in common from both lists
must satisfy (x_1 x_2)^r = t^r which is what we want. Next stage would be to combine
pairs of lists where the r^\mathrm{th} powers are constant over each list to make a new
list whose s^\mathrm{th} powers are constant, where s divides r, e.g.,
s=(P_1-1)/(7213.3.263). In other words we’re fixing things in increasingly big quotient
groups, until you end up with the whole group. Eventually at the last stage you end up
with a few lists whose values are constant modulo a set of primes.

(It would be simpler to work additively, but it’s probably going to be too expensive to
take discrete logs modulo each P_j we are fixing with respect to, which we’d need to do
it for each i and each choice of D_i, so we have to stay in the multiplicative group.)

A good configuration (for the time limits for this problem)
is something like: at the i^\mathrm{th} stage (i=0,\ldots,11), you have 2^{13-i} lists
each of size roughly 40.2^i, except at i=11, the final stage, where there are four lists of size about 150.
150 is chosen so that 150^4\approx5\times10^8 is a good number of candidates to test in the second phase.
(That is simplifying slightly. If K is small, then you need to build up the sizes first before cutting down.)

This is very similar to Wagner’s algorithm
which uses a tree of lists of binary vectors to choose one element from each top-level list so that the choices
XOR to zero. (Though he uses equal size lists at all levels, which is not optimal as it doesn’t balance the work.)

This is an example implementation. It
tries to use this method for type 4, and for some values of K in types 2 and 3.

The program is a bit fiddly because it has to choose a suitable set of primes to
fix with respect to, put the factors in order to form a plan of action in such a way
that it won’t run out of list elements on the way, and do this robustly on-the-fly.

In the type 4 case it usually manages to fix B_j for four primes. This isn’t a lot, but
it’s enough to give it a big advantage over not doing this, because moving out a bit in
the extreme tail of a normal distribution makes a big difference to the probability. In
terms of the benchmarks per second mentioned above, the “unfixed” method gave about 28
bm/s, the single-fixed method about 42 bm/s, and this method about 200 bm/s. (It’s
possible the bm/s measure is slightly underestimating the benefit of this method, since
the further out in the tail of the distribution you go, the bigger the improvement.)

vijju123 · May 4, 2018, 1:56pm

Very, VERY DECENT!!

I already gave the karma reward, and I think the community will seriously acknowledge this effort of yours!!

vivek_1998299 · May 4, 2018, 2:02pm

I really dont know how do i thank you…,thank you soo much!!!,very few professionals actually share their ideas in such a clear way…!!

alexthelemon · May 6, 2018, 2:12am

Thank you - I am glad you like it!

mgch · May 6, 2018, 3:34am

Wow!! @alexthelemon, thanks for your explanation!

We will try change challenges in two ways: multiple tests for the stabilization of test data in the input now. In JUNE we’ll try to use some kind of provisional tests.