We’ve all seen this warning at some point…
warning: comparison between signed and unsigned integer expressions [-Wsign-compare] |
Comparisons and arithmetic operations involving a mix of signed and unsigned numbers creep into code all the time. And when it does, compilers will sometimes produce helpful warnings such as the one shown above.
But what do you do if you see such a warning? Maybe you say to yourself, “What’s the big deal? The compiler should be able to figure out how to deal with mixed signage, right?” You then reassure yourself that it’s in-fact not a big deal, the compiler is smarter than you, and a blind eye is turned. If you’re feeling particularly cocky, you may even disable that compiler warning altogether.
But what are you ignoring? Could there be something sinister lurking in the dark, waiting to strike when you’re not paying attention?
Treading Into Murky Water
Take a look at this snippet of code.
#include <iostream> int main(int argc, char **argv) { unsigned int a = 1; signed int b = -1; if (b < a) std::cout << "All is right with the world.\n"; else std::cout << "Up is down and down is up!\n"; return 0; } |
The variable a, which is unsigned, is initialized to 1. The variable b, which is signed, is initialized to -1. And b is obviously less than a, right?
Both Visual Studio and GCC compile the code with a warning similar to that shown above. Running the code produces the following output.
Up is down and down is up! |
“Hey now! That’s madness!” you exclaim.
Perhaps. But before I explain what’s happening, let’s journey a little farther down the rabbit hole with one more example…
#include <iostream> int main(int argc, char **argv) { unsigned int a = 1; signed int b = -1; std::cout << "1 + -1 = " << (a + b) << "\n"; b = -2; std::cout << "1 + -2 = " << (a + b) << "\n"; return 0; } |
In this example, both Visual Studio and GCC compile the code without error and without warning (even with all warnings turned on). The following output is produced.
1 + -1 = 0 1 + -2 = 4294967295 |
“What in the world is going on here?!” In short – unsigned promotions.
Let’s see what the C++ standard has to say. You’ll find the following excerpt in Section 5, “Expressions”.
- If either operand is of scoped enumeration type, no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
- If either operand is of type long double, the other shall be converted to long double.
- Otherwise, if either operand is double, the other shall be converted to double.
- Otherwise, if either operand is float, the other shall be converted to float.
- Otherwise, the integral promotions shall be performed on both operands. Then the following rules shall be applied to the promoted operands:
- If both operands have the same type, no further conversion is needed.
- Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
- Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
- Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
- Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
In short, the last conversion rule is applied if none of the other rules apply – “..both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.”
In our first example, the variable b is converted to an unsigned int during the comparison to a. What happens when you convert the value -1 to an unsigned value? It’s the same value as UINT_MAX, which on a platform with 32-bit integers equates to 4294967295. And, of course, that’s not smaller than 1 which is why we saw the output we did.
What about the second example? The first line of output seemed to work just fine. Only the second line produced unexpected results.
In the first line of output, we saw:
1 + -1 = 0 |
Here, a was equal to 1 and b was equal to -1. The code for this was…
std::cout << "1 + -1 = " << (a + b) << "\n"; |
However, if we substitute the values in for (a + b), applying the unsigned promotion to b, we have the following (assuming 32-bit integers)…
std::cout << "1 + -1 = " << (1 + 4294967295) << "\n"; |
In this particular case, adding 1 to 4294967295 overflows resulting in a value of 0. What’s especially interesting here is that 0 was our expected result, so this code actually provides us with a false sense that everything is working as it should.
It was only after we set b to a value of -2 that we saw weird things happen. Once again, here’s what the code looks like once b is promoted.
std::cout << "1 + -2 = " << (1 + 4294967294) << "\n"; |
And this, of course, equals 4294967295.
The Takeaway
Many C++ experts are adamant that the only time you should ever use an unsigned data type is when you need to store a bitmask. Signed data types should be used in ALL other cases. If you end up needing a value that exceeds the range of a given signed data type, use a bigger signed data type.
I agree with the sentiment of this. But the reality is that we don’t always have the luxury of picking our data types. We’re often at the mercy of APIs, sensor specifications, file formats, network protocols, etc.
Any time you encounter or write code that mixes signed and unsigned data types, proceed with caution. Think carefully about how the data is used, and apply a healthy dose of skepticism. And, of course, when in doubt, test, test, test.
* The STL makes heavy use of size_t, which is an unsigned type. When brought up in conversation, this point is often met with loud and uncomfortable groans. The general feeling is that allowing size_t into the STL was a mistake. But it’s something we’re stuck with for now.