Hey everyone! I decided to write this article after seeing many people in the forum having trouble dealing with integer overflow. This is mostly about how integers work and why overflow occurs, but there is some general advice on handling overflow at the end. I learnt a few things myself while writing this, and I hope this will be helpful to you guys as well. So here goes.
How are integers stored in memory?
Integers are stored in memory in their base 2 form, also called binary. The decimal system uses 10 digits whereas binary uses only 2. In the decimal system each place is associated with a power of 10, in binary each place is associated with a power of 2.
For example, 118 in base 10 will be represented as 1110110 in base 2. This is because in base 10, the value 118 represents the sum 1 \times 10^2 + 1 \times 10^1 + 8 \times 10^0 using powers of 10, but in base 2 the same sum will be represented using powers of 2 as 1 \times 2^6 + 1 \times 2^5 + 1 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 1 \times 2^1 + 0 \times 2^0. Just like the decimal system, every number can be represented in binary, and every number has only one binary representation.
In computers the binary numbers are stored as a sequence of a fixed number of binary digits, or bits. 8 bits make up a byte. Datatypes of various sizes such as 1 byte, 2 bytes, 4 bytes and 8 bytes are commonly available for use. If the number 118 is to be stored in a datatype of size 1 byte (or 8 bits), it will be stored in its binary form with an appropriate number of leading zeroes.
+---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 118 +---+---+---+---+---+---+---+---+
A total of 2^n values can be stored in a n-bit datatype. Is it easy to see this because each of the n bits may be 0 or 1, which gives 2^n unique sequences. So all numbers from 0 to 2^n-1 can be stored.
How do operations, such as addition, work in binary?
Addition works in binary just like in decimal. We add 2 bits and the previous carry if any, keep the first bit of the sum in the result and carry over the extra. So adding 118 and 25 we would get
+---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 118 +---+---+---+---+---+---+---+---+ + | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | + 25 +---+---+---+---+---+---+---+---+ ----- = | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | = 143 +---+---+---+---+---+---+---+---+
Let’s try that again. What happens if we compute 118 + 143?
+---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 118 +---+---+---+---+---+---+---+---+ + | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | + 143 +---+---+---+---+---+---+---+---+ ----- = 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | = 261 ?? +---+---+---+---+---+---+---+---+
Wait, we get an extra 1 which should be placed in the 9th bit, but we only have space for 8 bits?! What happens now?
This is what is called integer overflow. The answer to what happens next is, it depends on the programming language. For now, let us assume the extra bit is discarded. So, surprisingly, 118 + 143 results in 101 in binary, which is 5 (
[3]).
Adding $x$ to a number is the same as moving it $x$ steps ahead on a number line. However, taking overflow into account, the numbers here form a cycle where after $2^n-1$ the next number is $0$ again. So the sequence of numbers goes $[0,1,2,...,2^n-2,2^n-1,0,1,2,..]$. Adding $x$ to a number on this sequence may cross over from $2^n-1$ to $0$ and continue, which is what happens in the above example.
Multiplication works similarly, except the true result may be many bits wider than the size instead of just 1. For example, multiplying $118$ and $143$ gives $16874$ which is $100000111101010$ in binary. However, just like in addition only the lowest 8 bits, $11101010$, will be kept, which the gives the result $234$ (
4).
To summarize, during an addition or multiplication operation involving two n-bit numbers, if the result does not fit in the range 0 to 2^n-1, overflow is said to have occured. Irrespective of whether overflow occurs, only the lowest n bits are kept. Equivalently we can also say that the result is being stored modulo 2^n. This is just like the decimal system where taking the last n digits is the same as performing modulo 10^n. In fact, the cycle mentioned earlier is a property of modular arithmetic.
That was addition and multiplication, but how does subtraction work?
To subtract b from a, -b is added to a.
Okay, so how do we store -b if we can only store 0 to 2^n-1, which are all positive?
This is the interesting part. Till now we have only discussed unsigned numbers. There are a few ways to represent a signed number, but in the most commonly used system today the two’s complement form of a number is used to represent its negative value. The following procedure is used to get the two’s complement of a number.
- Invert all the bits, in other words change all 0 s to 1 s and vice versa.
- Add 1 to this new value.
So what’s the point of using this form? The benefit is that using this allows for ordinary operations at the hardware level on positive values to also work on negative values!
Let us restrict ourselves to 4-bit numbers for simplicity. Consider the value 5 in 4 bits of binary as 0101. Inverting the bits gives us 1010. And adding 1 to that gives us 1011. What the two’s complement actually does is that it performs subtraction implicitly. This happens in the first step. Inverting the bits of 0101 is the same as subtracting 0101 from 1111. Adding 1 to the result means the final result is 1111 - 0101 + 1 = 10000 - 0101 = 1011. Generally speaking, the two’s complement of a n-bit number a is given by 2^n - a.
Here is a list of all the 4-bit numbers, and the signed and unsigned values they represent
bit sequence signed unsigned value value +---+---+---+---+ | 1 | 1 | 1 | 1 | -1 15 +---+---+---+---+ | 1 | 1 | 1 | 0 | -2 14 +---+---+---+---+ | 1 | 1 | 0 | 1 | -3 13 +---+---+---+---+ | 1 | 1 | 0 | 0 | -4 12 +---+---+---+---+ | 1 | 0 | 1 | 1 | -5 11 +---+---+---+---+ | 1 | 0 | 1 | 0 | -6 10 +---+---+---+---+ | 1 | 0 | 0 | 1 | -7 9 +---+---+---+---+ | 1 | 0 | 0 | 0 | -8 8 +---+---+---+---+ | 0 | 1 | 1 | 1 | 7 7 +---+---+---+---+ | 0 | 1 | 1 | 0 | 6 6 +---+---+---+---+ | 0 | 1 | 0 | 1 | 5 5 +---+---+---+---+ | 0 | 1 | 0 | 0 | 4 4 +---+---+---+---+ | 0 | 0 | 1 | 1 | 3 3 +---+---+---+---+ | 0 | 0 | 1 | 0 | 2 2 +---+---+---+---+ | 0 | 0 | 0 | 1 | 1 1 +---+---+---+---+ | 0 | 0 | 0 | 0 | 0 0 +---+---+---+---+
It is clear that in the signed system we are sacrificing half of our range to negative values. So now we still store 2^n values, but instead they range from -2^{n-1} to 2^{n-1}-1. Also, conveniently, the highest bit is an indicator of whether our number is positive (0) or negative (1). Hence this is also called the sign bit.
How do operations on two’s complement numbers still work properly?
Think of addition as moving forward along a number line once again, only this time it will be with reference to the list above so addition will be moving upwards along the list. Assume we are adding some value to a negative number -a.
1. The value is some positive value b, b < a.
Consider a = 5, hence -a is represented as 1011, and b = 3 which is 0011.
+---+---+---+---+ | 1 | 0 | 1 | 1 | -5 +---+---+---+---+ + | 0 | 0 | 1 | 1 | + 3 +---+---+---+---+ ---- = | 1 | 1 | 1 | 0 | -2 +---+---+---+---+
The explanation is quite simple. From the list above, it is clear that increasing the value of 1011 does indeed increase its signed value by the same amount, moving it closer to -1. Algebraically, the result should be -a + b = -(a - b). In our operation with two’s complement, (2^n - a) + b = 2^n - (a - b), which we know represents -(a - b).
2. The value is some positive value b, b>a.
Consider a = 5 once again, and b = 6 which is 0110.
+---+---+---+---+ | 1 | 0 | 1 | 1 | -5 +---+---+---+---+ + | 0 | 1 | 1 | 0 | + 6 +---+---+---+---+ ---- = 1 | 0 | 0 | 0 | 1 | 1 +---+---+---+---+
Clearly, you can see how we have made use of unsigned overflow to get the right answer! Going by the list, we can see the cycle effect in action. The signed value value loops around the top and continues from the bottom. Algebraically, we expect the answer b - a, and operation we carry out is (2^n - a) + b = 2^n + (b - a), which under modulo 2^n gives us b - a as required.
3. The value is some negative value -b.
Consider a = 5 again, and b = 2, so -b is represented as 1110.
+---+---+---+---+ | 1 | 0 | 1 | 1 | -5 +---+---+---+---+ + | 1 | 1 | 1 | 0 | + -2 +---+---+---+---+ ---- = 1 | 1 | 0 | 0 | 1 | -7 +---+---+---+---+
This works in the same way as case 2. The correct answer is -(a + b), and we perform (2^n - a) + (2^n - b) = 2 \times 2^n - (a + b) which under modulo 2^n is equivalent to 2^n - (a + b), giving the right answer.
Multiplication works similarly. In summary, all additions and multiplications are carried out as usual under modulo 2^n, which is sure to give the correct result as long as it is within the valid range.
Overflow can still cause unintended results with signed values, right?
Yes, any operation whose true result is out of the range -2^{n-1} to 2^{n-1}-1 will give a wrong result. This is overflow in the context of signed numbers. The “boundary” for overflow for unsigned numbers is between 2^n-1 and 0, but for signed numbers it is between 2^{n-1}-1 and -2^{n-1}. You can see these are not at the same position on the list. The unsigned boundary is between 1111(15) and 0000(0) and the signed boundary is between 0111(7) and 1000(-8). Simply put, the overflow boundary lies at the point where there is a discontinuous jump between consecutive values.
Wrong result example
Try adding 5 and 7 as 4-bit values 0101 and 0111. The result is 1100, which represents -4! This is why adding two positive numbers can give a negative number, or adding two negative numbers can give a positive number, and other unexpected results. Examples in code.
Right result in spite of overflow
Interestingly, if there are multiple addition and/or multiplication operations but the end result is within valid bounds, the answer will be right.
Consider performing 5 + 7 - 6, which on evaluating left to right gives 0101 + 0111 + 1010 = 1100 + 1010 = 10110. The last 4 bits are 0110, which is the correct value 6. This is once again due to the cycle effect. Examples in code.
So how do I avoid overflow in my programs?
Keep in mind the following things
- The maximum and minimum values of the datatype of the variables you are using.
- The constraints of the input which you are storing in your variables.
- The operations you are performing on your variables, including type promotion and operator precedence.
Using this information it is possible to predict and prevent overflow by switching to a larger datatype, or explicitly type converting to a larger datatype in the expression itself, or applying modulo operations if the problem requires it. Not as hard as it sounds.
That’s all!
Thanks for reading
I hope this article has increased (or at least not decreased :P) your understanding of how integers work and why overflow occurs.
Feel free to point out any mistakes or suggest improvements!
Bonus points
1. If you try to negate the minimum n-bit value -2^{n-1}, you get back the same value instead of its positive value.
2. Theoretically, you can choose any subset of the range 0 to 2^n-1 to be positive and the rest to be negative (complement) values, and all operations still hold under modulo 2^n.
Some links:
How some programming languages deal with integer overflow
Difference in signed and unsigned overflow behaviour by the C/C++ standard
Blog post found in an answer at the above page
Check out the number circle here for a better intuition of the cycle effect
Unexpected result involving unsigned and signed number promotion