Generic optimization techniques

Are there any more generic optimization techniques, like ios_base::sync_with_stdio(false) (which speeds up iostream significantly) to reduce the running time of my programs?

Edit: I forgot to add, I am looking specifically for C++ techniques.
Edit2: Also, it would be extremely helpful to me if answers could include explanations of any custom I/O functions; I’ve never dabbled in these.

1 Like

you can use your own fast input and output function .

  inline void Scan_f(int a)
 {
 char c = 0;
while(c<33)
 //c = fgetc_unlocked(stdin);
 c = getc(stdin);
a = 0;
while(c>33)
{
a = a*10 + c - '0';
//c = fgetc_unlocked(stdin);
c = getc(stdin);
}

and in place of scanf("%d",&t) ; you should use this function as Scan_f(&t) ; only.

if you are using C++ use ‘\n’ in place of endl , as endl will flush the output buffer , consuming more time .

cache locality: accessing memory is faster when successive memory accesses are to nearby locations. A common effect of this is that it’s much faster to iterate over a two-dimensional array with the first index in the outer loop and the second index in the inner loop than vice versa, because this will access elements in linear order in memory and hence hit the cache most of the time, whereas iterating the other way will miss the cache every time, which can result in code that is several times slower.

cache1.c

int main() {

static int x[5000][5000];

for (int i = 0; i < 5000; i++) {

    for (int j = 0; j < 5000; j++) {

        x[i][j]++;

    }

}

return 0;

}

cache2.c

int main() {

static int x[5000][5000];

for (int i = 0; i < 5000; i++) {

    for (int j = 0; j < 5000; j++) {

        x[j][i]++;

    }

}

return 0;

}

results on linux terminal :-

$ gcc -std=c99 -O0 cache1.c -o cache1

$ gcc -std=c99 -O0 cache2.c -o cache2

$ time ./cache1

real 0m0.208s

user 0m0.061s

sys 0m0.062s

$ time ./cache2

real 0m2.251s

user 0m2.027s

sys 0m0.155s

Use macros and Inline functions. Calling a function involves overhead which can be avoided by using macros or inline functions, but you need to very careful.

Use left/right shift in place of division/multiplication by 2. For example you can replace x = x*4 by x = x<<2.

to check if a number is odd in place of if(x%2==1) use if(x&1)

if you are passing a vector to a function, use a constant reference.

avoid passing arguments by value.

2 Likes

Yes there are some technique like cstdio and iostream w/sync_with_stdio(false)

Just curious…among for loop, while loop and do-while loop is any of them particularly faster than the rest?

@sandy999 if the number of iterations are same then all the loops will consume the same time because at the hardware level , their intermediate code (machine targetted code) are same .

1 Like

http://codeforces.com/blog/entry/5217

Try anyone of the following.

1. 
inline int scan_f( int *location)
{	register int digit;
	register int result = 0;
	while( (digit = getchar()) >= '0')
	result = (result << 3) + (result << 1) + digit - '0';
	*location = result;
	return 0; }
for scanning scan_f(&t);


2. 
inline int inp() {
int n=0;
char p=getchar_unlocked();
while((p<'0'||p>'9')&&p!=EOF)
p=getchar_unlocked();
while(p>='0'&&p<='9')
{
n = (n<< 3) + (n<< 1) + (p - '0');
p=getchar_unlocked();}
return n;}

For Scanning int n; n = inp();

3. 
char ibuf[BUF];
int ipt = BUF;


//Global declaration 
int readInt() {
while (ipt < BUF && ibuf[ipt] < '0') ipt++;
if (ipt == BUF) {
fread(ibuf, 1, BUF, stdin);
ipt = 0;
while (ipt < BUF && ibuf[ipt] < '0') ipt++;
}
int n = 0;
while (ipt < BUF && ibuf[ipt] >= '0') n = (n*10)+(ibuf[ipt++]-'0');
if (ipt == BUF) {
fread(ibuf, 1, BUF, stdin);
ipt = 0;
while (ipt < BUF && ibuf[ipt] >= '0') n = (n*10)+(ibuf[ipt++]-'0');
}
return n;
}

Hope it helps!

Hey @samchawla

I would like to tell you, what optimization technique will do is to help you raise your position on problem leader-board.

What actually matters is the algorithm, if you use fast io and use insertion sort instead of quick-sort or merge-sort, that wont help you a lot.

Focus more on algorithm, your primary concern should be Algorithm optimization and secondary concern should be other optimizations also if your algorithm is good you will surely get ACed without any other optimization techniques. Still if you need help on those optimization techniques look at these mentioned below:

#include<stdio.h>
#define gc getchar_unlocked
#define pc putchar_unlocked

/* For windows based system */
/* Replace getchar_unlocked with getchar */
/* Replace putchar_unlocked with putchar */


/**
Use :
[ lprint ] for printing long numbers to buffer
[ lscan ] for taking long integers as input from buffer
[ sscan ] for taking normal integers as input from buffer
**/


inline void lprint(long int a)
{
 int i=0;
char S[20];
while(a>0)
{
    S[i++]=a%10+'0';
a=a/10;
}
--i;
while(i>=0)
pc(S[i--]);
pc('\n');
}

inline unsigned long long uscan()
{
    unsigned long long n=0,c=gc();
while(c<'0'||c>'9')
c=gc();
while(c<='9'&&c>='0'){
n=n*10+c-'0';
c=gc();}
return n;
}

inline long int lscan()                 
{
    long int n=0,c=gc();
while(c<'0'||c>'9')
c=gc();
while(c<='9'&&c>='0'){
n=n*10+c-'0';
c=gc();}
return n;
}


inline  int sscan()                     
{register  int n=0,c=gc();
while(c<'0'||c>'9')
c=gc();
while(c<='9'&&c>='0')
    {
n=n*10+c-'0';
c=gc();
    }
return n;
}

Include these above your code and take input by calling the particular function.

Take a look at my solution where i used fast io:

http://www.codechef.com/viewsolution/5455880

Follow what @acodebreaker2 mentioned in his post.

Use good sorting techniques or as us mentioned you use C++ use in-build functions and use STL.

Use bitwise operators,

check this link:

For Prime generators here is a link for C++:

and for python: