PROBLEM LINK:
Setter ShavelV
Tester Teja Vardhan Reddy
Editorialist Abhishek Pandey
DIFFICULTY:
MEDIUM
PREREQUISITES:
Fenwick Tree supporting range updates and range query, or segment tree with lazy propagation, Properties of AND, Observations
PROBLEM:
Given an array A, answer Q queries which ask the number of subsequences in range [L,R] whose bitwise AND is a perfect square.
QUICKEXPLANATION:
Key to AC AND of numbers can change no more than LogN time, and hence there are at most O(NLogN) different values possible. It is hence possible to update answer over a group or range, which can be done via segment tree or fenwick tree.
Notice that there are at most O(NLogN) values of AND possible in total. Now, either of the 2 approaches are possible
 Most Popular Solution Preprocess the location of different LogN AND values you can obtain from index i, or preprocess the distance upto which j'th bit of A_i remains same. Both of these can be used for next step, where, while operating over a element, we determine the groups with which the AND gives perfect square. Range update the corresponding interval, and for answering, we can use range sum query. We need to sort the queries for this approach.
 Setter’s Solution Store the queries in an array of vector. Query[r] will store all queries ending at index r, along with index of query in original order. The next step can be summarized in 2 words. Maintain groups. Store the pair \{GroupValueOfAND,i\} in a vector and update this vector every iteration. If 2 groups get same AND value then merge them. We can see that we can reduce this to range updates and range queries on fenwick tree and/or segment tree now.
EXPLANATION:
The editorial will focus on one of the 2 solutions, and that will be the most popular solution used by the participants  simply because the solution is pretty crisp and simple :). Setter’s solution has been very briefly described in quick explanation, and setter’s notes will be provided for people interested in his approach. His solution has been made commented so things are clear. We will discuss only one because difference between both solutions isnt very much, except for the fact that the participant’s solution is quite intuitive and easy to understand
As usual, answers to all the (Why?)'s and proofs will be given in my corner  along with extra resources and related problems.
Lets run through first subtask quickly. Sine N \le 100, you could brute force. Take the range [l,r] in input, and then iterate over all subarrays in this range, i.e. over all i from l to r as starting point of subarray, and for each starting point, over j from i to r as ending point of the subarray. If the AND of the subarray satisfies the condition, increment the answer by 1. This works in O(Q*N^2). Lets move over to the meaty party now
Most Popular Solution
Let me break this into 3 parts  Intuition, Preprocessing, and answering Queries.
1. Intuition
This solution uses the observation that the AND value can change at most LogN times. (Why? P1). It is easy to see that AND is a nonincreasing function. If we AND the current ANDsum received so far (of the subarray) with next element, it will either remain constant or decrease.
We know that it can decrease only at most Log_2N times, and if it remains constant, then we can do range operations on the group as a whole instead of doing it individually. This was the intuition a participant can pick up to at least gain a direction of how to solve this problem.
2. PreProcessing 
Now, moving on the the most popular question, how to solve this problem :). Since total number of AND values are NLog_2N, we can make a 2D array to store the positions of (atmost) Log_2N distinct values of AND for all subarrays starting from index i. In other words, array P[i][j] will store the location of j'th change in AND sum for subarrays beginning at index i. One of the ways by which many contestants and red coders did it was, by checking the location when the j'th bit changed. If j'th bit remains constant, then we are assured that at least that bit wont change during AND, else if that bit changes, then we found a location where the AND is changing. The location closest to i among all the bits will be the location upto which this group has same AND.
There are multiple implementations possible, but for sake of completeness I have given this above described implementation in tab below for those who arent able to come up with the implementations.
Click to view
LL fsu[31][N];
for(i=0;i<31;i++)//Initialize
fsu[i][n]=n+1;
for(i=n1;i>=1;i)
{
for(j=0;j<31;j++)
{
if((1<<j)&a[i+1])//If the bit is set
fsu[j][i]=fsu[j][i+1];
else //If the bit changes
fsu[j][i]=i+1;
}
}
3. Querying
Do yourself a favor and sort the queries. So MANY contestants underestimate this! Sort the queries on basis of their beginning and/or ending points. If you want to measure how effective it can be, it can change “iterating over the array every time we have to answer a query” to “iterating once to answer all queries collectively.” This step is the heart of square root decomposition and Mo’s algorithm’s.
Now, look at what we have in preprocessed array. We have positions of next change of AND. This means we can quickly jump forward over groups in array. What does this mean? It means if we iterate backwards, from N1 to 0 (0based indexing), then for every index we have to do calculations for at most Log_2N groups ahead of it. Good!
Keep a main pointer pointing at end of array, or in simpler words, iterate from i=N1 to 0 now :p. Now, since AND of one element is that element itself, initialize current AND (say currand) as A_i and starting position of this AND (say, st) as st=i. Now, use the preprocessed array to directly jump to the beginning of next group, and find new value of currand. If this AND value satisfies the condition, then we do a increment by 1 the values in range [st,ed] where we define ed as ending point of the old group. Update the values of currand, and set st to point at starting element of new group as this is where the new currand begins. Once main pointer arrives at the beginning point, or l value of current query, its time to answer it! To get the solution for a query, we simply find sum of range [L,R] given in query. As simple as it is!
Now, how do we do it? We can use segment tree with lazypropagation. But I want you guys to explore another data structure, fenwick tree. Initially my plan was to make this editorial like my other segment tree ones, telling parent child relation, what to store in node and other stuff, but this time I decided to simply describe the idea and leave the how part to you guys. Almost all the good coders used Fenwick tree. BIT, or Fenwick tree, can be at least 4 times faster than and efficient than segment tree, both in terms of memory and speed. The best part is, which allowed me to leave this part for you guys, is that the operation involved is simply standard range update and range query operation with almost no twist. Nothing can be a better chance to explore and master a data structure than this. I have searched thoroughly to get the bestest resources for you guys to see and hence implement and see the data structure yourself, so buckle up! A wind of change is blowing :).
A minor spoiler for the process We maintain 2 fenwick trees, and use updating concept similar to difference array. Like, in difference array, to update the range [L,R] by K, we did A[L]+=K and A[R+1]=K. Similarly we will break our query of updating in range [L,R] into 2 parts and execute it. Also, a term “linear” function will be involved at the tutorials, dont let it get to you, its just simple stuff like (ab)x=axbx. You’ll see this yourself when you read the tutorials
Now, do you think you get the idea? If yes, then answer this What does our update represent? (in case your ans was no, hop on to my corner to understand this! :D)
Another interesting question, we see that we are updating in range of the group. How does it prevent overcounting, or counting only subarrays inside range [L,R]? As in, say my array is [2,2,1,1]. I will do an update on range [3,4] (1based indexing) for 1 at index 2 and then another update over same range for 1 at index 1. If I give query [2,4], how will the above algorithm give correct answer? Refer to the code in tab below and then to the explanation
Click to view
for(i=0;i<q;i++)
{
l=qr[i].l;
r=qr[i].r;
while(cl>l)//cl = main pointer iterating from N1 to 0.
{
cl;
y=a[cl];//y=currand
z=cl;//z=st
while(z<=n)
{
y&=a[z];//currAnd of with new group.
if(y==0)
{
ctr++;
updateRange(bt1,bt2,n,1,z1,n1);
break;
}
m=sqrt(y);
LL nx=n+1;
for(j=0;j<31;j++)
if(y&(1<<j))
nx=min(nx,fsu[j][z]);//Position of starting of new group
if(m*m==y)
{
updateRange(bt1,bt2,n,1,z1,nx2);//Fenwick tree uses 0 based indexing, hence 2 to
}//keep at same group.
z=nx;
}
}
SOLUTION
The solutions are pasted here for your convenience in case the links arent activated by @admin yet
Click to view
#include <iostream>
#include <bits/stdc++.h>
#define ll long long
using namespace std;
int n, m, t, l, r, a[100005];
ll d[2][100005], ans[500005];
vector< pair<int, int> > rs[100005];
//vector for groups with equal AND
vector< pair<int, int> > groups;
//fenwick tree
ll sum(ll num, ll pos)
{
ll sm = 0;
while(pos >= 0)
{
sm += d[num][pos];
pos = (pos & (pos + 1))  1;
}
return sm;
}
void update(ll num, ll pos, ll x)
{
while(pos < n)
{
d[num][pos] += x;
pos = (pos  (pos + 1));
}
}
int main()
{
scanf("%d", &t);
while(t)
{
scanf("%d", &n);
scanf("%d", &m);
for(int i = 0; i < n; i++)
{
scanf("%d", &a[i]);
d[0][i] = d[1][i] = 0;
rs[i].clear();
}
d[0][n] = d[1][n] = 0;
for(int i = 0; i < m; i++)
{
scanf("%d%d", &l, &r);
l;
r;
rs[r].push_back({l, i});
}
groups.clear();
for(int i = 0; i < n; i++)
{
//new vector for groups
vector< pair<int, int> > newgroups;
for(int j = 0; j < groups.size(); j++)
{
int cur = groups[j].first & a[i];
if(!j  cur != newgroups[newgroups.size()  1].first)
{
newgroups.push_back({cur, groups[j].second});
}
}
if(newgroups.size() == 0  newgroups[newgroups.size()  1].first != a[i])
newgroups.push_back({a[i], i});
//making new vector current
groups = newgroups;
for(int j = 0; j < groups.size(); j++)
{
//checking for a perfect square
int sq = floor(sqrt(groups[j].first));
if((sq  1) * (sq  1) == groups[j].first  sq * sq == groups[j].first  (sq + 1) * (sq + 1) == groups[j].first)
{
//getting a borders
l = groups[j].second;
if(j != groups.size()  1)
r = groups[j + 1].second  1;
else
r = i;
//adding an arithmetic progression
update(0, l, (r + 1));
update(0, r + 1, (r + 1));
update(1, l, 1);
update(1, r + 1, 1);
//adding on a prefix
update(0, 0, r  l + 1);
update(0, l, (r  l + 1));
}
}
for(int j = 0; j < rs[i].size(); j++)
{
//getting an answer for a query
ans[rs[i][j].second] = sum(0, rs[i][j].first);
ans[rs[i][j].second] = sum(1, rs[i][j].first) * rs[i][j].first;
}
}
for(int i = 0; i < m; i++)
{
printf("%lld\n", ans[i]);
}
}
return 0;
}
Click to view
//teja349
#include <bits/stdc++.h>
#include <vector>
#include <set>
#include <map>
#include <string>
#include <cstdio>
#include <cstdlib>
#include <climits>
#include <utility>
#include <algorithm>
#include <cmath>
#include <queue>
#include <stack>
#include <iomanip>
//setbase  cout << setbase (16); cout << 100 << endl; Prints 64
//setfill  cout << setfill ('x') << setw (5); cout << 77 << endl; prints xxx77
//setprecision  cout << setprecision (14) << f << endl; Prints x.xxxx
//cout.precision(x) cout<<fixed<<val; // prints x digits after decimal in val
using namespace std;
#define f(i,a,b) for(i=a;i<b;i++)
#define rep(i,n) f(i,0,n)
#define fd(i,a,b) for(i=a;i>=b;i)
#define pb push_back
#define mp make_pair
#define vi vector< int >
#define vl vector< ll >
#define ss second
#define ff first
#define ll long long
#define pii pair< int,int >
#define pll pair< ll,ll >
#define sz(a) a.size()
#define inf (1000*1000*1000+5)
#define all(a) a.begin(),a.end()
#define tri pair<int,pii>
#define vii vector<pii>
#define vll vector<pll>
#define viii vector<tri>
#define mod (1000*1000*1000+7)
#define pqueue priority_queue< int >
#define pdqueue priority_queue< int,vi ,greater< int > >
ll n;
ll bit[4][123456];
// BIT update
ll update(ll typ,ll pos,ll val){
pos++;
while(pos<n+10){
bit[typ][pos]+=val;
pos+=pos&(pos);
}
return 0;
}
//BIT query
ll querybit(ll typ,ll pos){
pos++;
ll ans=0;
while(pos>0){
ans+=bit[typ][pos];
pos=pos&(pos);
}
return ans;
}
// query BIT from l to r inclusive
ll query(ll typ,ll l,ll r){
return querybit(typ,r)querybit(typ,l1);
}
ll dp[123456][22];
// query sparse table on & operation
ll getans(ll l,ll r){
ll len=rl+1;
ll val=log2(len);
return (dp[l][val]&dp[r(1<<val)+1][val]);
}
// get smallest index i>=lowee such & operator in elements [lowee,i] will give less than or equal to ans.
ll getstart(ll lowee,ll ans){
ll low=lowee;
ll high=n1;
ll mid,ans11,val;
while(low<=high){
mid=(low+high)/2;
val=getans(lowee,mid);
if(val<=ans){
ans11=mid;
high=mid1;
}
else{
low=mid+1;
}
}
return ans11;
}
// get largest index i>=lowee such & operator in elements [lowee,i] will give greater than or equal to ans.
ll getlast(ll lowee,ll ans){
ll low=lowee;
ll high=n1;
ll mid,ans11,val;
while(low<=high){
mid=(low+high)/2;
val=getans(lowee,mid);
if(val>=ans){
ans11=mid;
low=mid+1;
}
else{
high=mid1;
}
}
return ans11;
}
// adder contains points at which elements must be added.
// subber contains points at which elements must be removed
vector<vi> adder(123456),subber(123456);
// queries are stored as per their r value.
vector<vii> quer(123456);
ll a[123456];
ll remem_start_point[123456],ans_query[523456];
signed main(){
//std::ios::sync_with_stdio(false);
ll t;
cin>>t;
set<ll> seti;
ll i;
// finding square numbers.
rep(i,123456){
seti.insert(i*i);
}
while(t){
ll q;
cin>>n>>q;
ll j;
rep(i,n){
//cin>>a[i];
scanf("%lld",a+i);
dp[i][0]=a[i];
}
// constructing sparse table
f(i,1,20){
rep(j,n){
if(j+(1<<i)<=n){
dp[j][i]=(dp[j][i1]&dp[j+(1<<(i1))][i1]);
}
}
}
// clearing for multiple test cases.
rep(i,n+10){
adder[i].clear();
subber[i].clear();
quer[i].clear();
rep(j,3){
bit[j][i]=0;
}
}
ll val,st,en;
rep(i,n){
val=a[i];
// finding start and end positions for each possible value of & operator on
// subarray starting at i. Note there will be only log(a[i]) values.
while(1){
st=getstart(i,val);
en=getlast(i,val);
if(seti.find(val)!=seti.end()){
// noting down start and end points in useful cases (perfect square cases)
adder[st].pb(i);
subber[en].pb(i);
}
if(en==n1){
break;
}
val=getans(i,en+1);
}
}
ll l,r,ans,cnt,cnt1,pos;
rep(i,q){
scanf("%lld",&l);
scanf("%lld",&r);
l;
r;
quer[r].pb(mp(l,i));
}
// simulate from left to right..
rep(i,n){
rep(j,adder[i].size()){
update(0,adder[i][j],1);
update(1,adder[i][j],i1);
remem_start_point[adder[i][j]]=i1;
}
rep(j,quer[i].size()){
l=quer[i][j].ff;
r=i;
pos=quer[i][j].ss;
ans=query(2,l,r);
cnt=query(0,l,r);
cnt1=query(1,l,r);
ans+=cnt*r  cnt1;
ans_query[pos]=ans;
}
rep(j,subber[i].size()){
update(0,subber[i][j],1);
update(1,subber[i][j],1*remem_start_point[subber[i][j]]);
update(2,subber[i][j],iremem_start_point[subber[i][j]]);
}
}
rep(i,q){
//cout<<ans_query[i]<<endl;
printf("%lld\n",ans_query[i]);
}
}
return 0;
}
Editorialist will be put on demand. But till then, take this solution for reference. Shoutout to @abdullah768 for being a good boi writing clean codes
Time Complexity=O(QLogN+N*LogA *LogN) (setter’s solution)
Space Complexity=O(NLogA_i) where A_i is maximum element in A approximately.
CHEF VIJJU’S CORNER
1. Why AND changes at most Log_2N times?
Click to view
Recall that Log_2N is nothing but number of bits in N. Now, take any value of AND and represent it in binary. Each bit can be either 0 or 1. When we AND it with another number, remember that no new bit can be set, i.e. none of the 0's can become 1, but set bits can become unset (i.e 1's can change to 0's). If no bit is unset, i.e. none of the 1's became 0, then the value is same as it already was, else it changed. Maximum number of 1's allowed is Log_2N, and hence the value can change only Log_2N times, as it needs at least one of the 1's to become unset. Note that we are talking about number of time the value changes, not total number of possible values.
2. What does update over the range represent?
Click to view
It represents that "If we do AND of elements from i to st1 (st being starting of this group), then this particular group lying in range [st,ed] (ed is ending of this group) contributes to the answer. There will be a total of edst+1 subarrays possible, as any subarray beginning from i and terminating in this range has to be counted.
3. How does the above algo not count subarrays outside the range if its simple sum
Click to view
Simple, because we dont update! Follow the code given closely and then reread the line “Once we get to the l value of current query, its time to update!”. Remember that the main pointer traverses from N1 to 0, and only after it reaches an element i, is the contribution of it to answer or other subarrays updated in tree. Before that, it is 0. Hence, as we move backwards, we add contributions of elements starting after l value of current query. This is possible if we keep to choose to sort queries on descending order of l value. Hence, when we reach l value of current query, contribution of all elements ahead of this is counted, and contribution of elements before this (i.e. lying at an index less than l) is still 0 as main pointer hasnt reached them yet. Hence, we obey the l value of query as well.
4. Setter’s Notes
Click to view
Let’s notice that for fixed number, if we do AND operations with it, it changes no more than logN times. Thus, let’s iterate through right border of segment and store an array of groups of suffixes with equal AND. We can store them as pairs: {AND value , index of the group start}. For the next right border we update the results and merge groups with equal and values. Let’s notice that we need to add size of group to all queries with l \le index of start of a group. Also let’s notice, that for queries with l inside a group, we need to add an arithmetic progression. We can do it with two fenwick trees. When we meet an r border of any segment, just take an answer for it left border. TC O(q log n + n * log A * log n)
5. Resources
 Resource 1  Fenwick Tree introduction.
 Resource 2  A brief answer to give you a spoiler on what to expect.
 Resource 3  I liked this blog.
 Resource 4  This is also quite cool.
6 Hall of fame for Noteworthy Solutions

taran_1407  Solved it online (i.e. his solution would work even if queries are online) !!

abdullah768  Neat Code.

codebreaker123  Solved using Mo’s algorithm + Hilbert’s order.
7. Related Problems