Why prefer signed over unsigned in C++?

C++Optimization

C++ Problem Overview


I'd like to understand better why choose int over unsigned?

Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?

I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').

I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.

But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?

C++ Solutions


Solution 1 - C++

As per requests in comments: I prefer int instead of unsigned because...

  1. it's shorter (I'm serious!)

  2. it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)

  3. what if I want to signal an error by returning an out-of-range value?

Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.

Solution 2 - C++

Let me paraphrase the video, as the experts said it succinctly.

> Andrei Alexandrescu:

> - No simple guideline. > - In systems programming, we need integers of different sizes and signedness. > - Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.

> Chandler Carruth:

> - Here's some simple guidelines: > 1. Use signed integers unless you need two's complement arithmetic or a bit pattern > 2. Use the smallest integer that will suffice. > 3. Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count. > - Stop worrying and use tools to tell you when you need a different type or size.

> Bjarne Stroustrup:

> - Use int until you have a reason not to. > - Use unsigned only for bit patterns. > - Never mix signed and unsigned

Wariness about signedness rules aside, my one-sentence take away from the experts: > Use the appropriate type, and when you don't know, use an int until you do know.

Solution 3 - C++

Several reasons:

  1. Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).

  2. unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.

Solution 4 - C++

There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.

In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.

Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.

Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.

The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.

But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.

In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.

Solution 5 - C++

The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).

Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.

The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.

If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.

Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.

Solution 6 - C++

Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.

Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.

As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.

It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.

Solution 7 - C++

  1. Use int by default: it plays nicer with the rest of the language
  • most common domain usage is regular arithmetic, not modular arithmetic
  • int main() {} // see an unsigned?
  • auto i = 0; // i is of type int
  1. Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)
  • has different semantics than regular arithmetic, make sure it is what you want
  • bit-shifting signed types is subtle (see comments by @ChristianRau)
  • if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
  1. Never mix signed and unsigned arithmetic
  • the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
  • turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
  • the Standard Library got it wrong with std::size_t (quote from the GN13 video)
  • use range-for if you can,
  • for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
  1. Don't use short or large types unless you actually need them
  • current architectures data flow caters well to 32-bit non-pointer data (but note the comment by @BenVoigt about cache effects for smaller types)

  • char and short save space but suffer from integral promotions

  • are you really going to count to over all int64_t?

Solution 8 - C++

To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.

There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:

  1. Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
  2. Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
  3. Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.

(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).

Solution 9 - C++

The int type more closely resembles the behavior of mathematical integers than the unsigned type.

It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.

The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)

Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.

A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.

Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.

The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.

The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.

Solution 10 - C++

int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.

If you need a bigger range, use a 64-bit integer.

If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.

Speed is not an issue.

Solution 11 - C++

One good reason that I can think of is in case of detecting overflow.

For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.

You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.

Solution 12 - C++

For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).

Solution 13 - C++

It gives unexpected result when doing simple arithmetic operation:

unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine

It gives unexpected result when doing simple comparison:

unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1

This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMordachaiView Question on Stackoverflow
Solution 1 - C++user529758View Answer on Stackoverflow
Solution 2 - C++Prashant KumarView Answer on Stackoverflow
Solution 3 - C++user4815162342View Answer on Stackoverflow
Solution 4 - C++AnTView Answer on Stackoverflow
Solution 5 - C++supercatView Answer on Stackoverflow
Solution 6 - C++vy32View Answer on Stackoverflow
Solution 7 - C++TemplateRexView Answer on Stackoverflow
Solution 8 - C++Mats PeterssonView Answer on Stackoverflow
Solution 9 - C++KazView Answer on Stackoverflow
Solution 10 - C++Luchian GrigoreView Answer on Stackoverflow
Solution 11 - C++umpsView Answer on Stackoverflow
Solution 12 - C++franji1View Answer on Stackoverflow
Solution 13 - C++SwiftMangoView Answer on Stackoverflow