Is there still a reason to use `int` in C++ code?

C++

C++ Problem Overview


Many style guides such as the Google one recommend using int as a default integer when indexing arrays for instance. With the rise of 64-bit platforms where most of the time an int is only 32 bits which is not the natural width of the platform. As a consequence, I see no reason, apart from the simple same, to keep that choice. We clearly see that where compiling the following code:

double get(const double* p, int k) {
  return p[k];
}

which gets compiled into

movslq %esi, %rsi
vmovsd (%rdi,%rsi,8), %xmm0
ret

where the first instruction promotes the 32 bits integer into a 64 bits integer.

If the code is transformed into

double get(const double* p, std::ptrdiff_t k) {
  return p[k];
}

the generated assembly is now

vmovsd (%rdi,%rsi,8), %xmm0
ret

which clearly shows that the CPU feels more at home with std::ptrdiff_t than with an int. Many C++ users have moved to std::size_t, but I don't want to use unsigned integers unless I really need modulo 2^n behaviour.

In most cases, using int does not hurt performance as the undefined behaviour or signed integer overflows allow the compiler to internally promote any int to a std::ptrdiff_t in loops that deal with indices, but we clearly see from the above that the compiler does not feel at home with int. Also, using std::ptrdiff_t on a 64-bit platform would make overflows less likely to happen as I see more and more people getting trapped by int overflows when they have to deal with integers larger than 2^31 - 1 which become really common these days.

From what I have seen, the only thing that makes int stand apart seems to be the fact that literals such as 5 are int, but I don't see where it might cause any problem if we move to std::ptrdiff_t as a default integer.

I am on the verge of making std::ptrdiff_t as the de facto standard integer for all the code written in my small company. Is there a reason why it could be a bad choice?

PS: I agree with the fact that the name std::ptrdiff_t is ugly which is the reason why I have typedef'ed it to il::int_t which look a bit better.

PS: As I know that many people will recommend me to use std::size_t as a default integer, I really want to make it clear that I don't want to use an unsigned integer as my default integer. The use of std::size_t as a default integer in the STL has been a mistake as acknowledged by Bjarne Stroustrup and the standard committee in the video Interactive Panel: Ask Us Anything at time 42:38 and 1:02:50.

PS: In terms of performance, on any 64-bit platform that I know of, +, - and * gets compiled the same way for both int and std::ptrdiff_t. So there is no difference in speed. If you divide by a compile-time constant, the speed is the same. It's only when you divide a/b when you know nothing about b that using 32 bits integer on a 64-bit platform gives you a slight advantage in performance. But this case is so rare as I don't see as a choice from moving away from std::ptrdiff_t. When we deal with vectorized code, here there is a clear difference, and the smaller, the better, but that's a different story, and there would be no reason to stick with int. In those cases, I would recommend going to the fixed size types of C++.

C++ Solutions


Solution 1 - C++

There was a discussion on the C++ Core Guidelines what to use:

https://github.com/isocpp/CppCoreGuidelines/pull/1115

Herb Sutter wrote that gsl::index will be added (in the future maybe std::index), which will be defined as ptrdiff_t.

> hsutter commented on 26 Dec 2017 • > > (Thanks to many WG21 experts for their comments and feedback into this > note.) > > Add the following typedef to GSL > > namespace gsl { using index = ptrdiff_t; } > > and recommend gsl::index for all container indexes/subscripts/sizes. > > Rationale > > The Guidelines recommend using a signed type for subscripts/indices. > See ES.100 through ES.107. C++ already uses signed integers for array > subscripts. > > We want to be able to teach people to write "new clean modern code" > that is simple, natural, warning-free at high warning levels, and > doesn’t make us write a "pitfall" footnote about simple code. > > If we don’t have a short adoptable word like index that is competitive > with int and auto, people will still use int and auto and get their > bugs. For example, they will write for(int i=0; i<v.size(); ++i) or > for(auto i=0; i<v.size(); ++i) which have 32-bit size bugs on widely > used platforms, and for(auto i=v.size()-1; i>=0; ++i) which just > doesn't work. I don’t think we can teach for(ptrdiff_t i = ... with a > straight face, or that people would accept it. > > If we had a saturating arithmetic type, we might use that. Otherwise, > the best option is ptrdiff_t which has nearly all the advantages of a > saturating arithmetic unsigned type, except only that ptrdiff_t still > makes the pervasive loop style for(ptrdiff_t i=0; i<v.size(); ++i) > emit signed/unsigned mismatches on i<v.size() (and similarly for > i!=v.size()) for today's STL containers. (If a future STL changes its > size_type to be signed, even this last drawback goes away.) > > However, it would be hopeless (and embarrassing) to try to teach > people to routinely write for (ptrdiff_t i = ... ; ... ; ...). (Even > the Guidelines currently use it in only one place, and that's a "bad" > example that is unrelated to indexing`.) > > Therefore we should provide gsl::index (which can later be proposed > for consideration as std::index) as a typedef for ptrdiff_t, so we can > hopefully (and not embarrassingly) teach people to routinely write for > (index i = ... ; ... ; ...).

> Why not just tell people to write ptrdiff_t? Because we believe it > would be embarrassing to tell people that's what you have to do in > C++, and even if we did people won't do it. Writing ptrdiff_t is too > ugly and unadoptable compared to auto and int. The point of adding the > name index is to make it as easy and attractive as possible to use a > correctly sized signed type.

Edit: More rationale from Herb Sutter

> Is ptrdiff_t big enough? Yes. Standard containers are already required > to have no more elements than can be represented by ptrdiff_t, because > subtracting two iterators must fit in a difference_type. > > But is ptrdiff_t really big enough, if I have a built-in array of char > or byte that is bigger than half the size of the memory address space > and so has more elements than can be represented in a ptrdiff_t? Yes. > C++ already uses signed integers for array subscripts. So use index as > the default option for the vast majority of uses including all > built-in arrays. (If you do encounter the extremely rare case of an > array, or array-like type, that is bigger than half the address space > and whose elements are sizeof(1), and you're careful about avoiding > truncation issues, go ahead and use a size_t for indexes into that > very special container only. Such beasts are very rare in practice, > and when they do arise often won't be indexed directly by user code. > For example, they typically arise in a memory manager that takes over > system allocation and parcels out individual smaller allocations that > its users use, or in an MPEG or similar which provides its own > interface; in both cases the size_t should only be needed internally > within the memory manager or the MPEG class implementation.)

Solution 2 - C++

I come at this from the perspective of an old timer (pre C++)... It was understood back in the day that int was the native word of the platform and was likely to give the best performance.

If you needed something bigger, then you'd use it and pay the price in performance. If you needed something smaller (limited memory, or specific need for a fixed size), same thing.. otherwise use int. And yeah, if your value was in the range where int on one target platform could accommodate it and int on another target platform could not.. then we had our compile time size specific defines (prior to them becoming standardized we made our own).

But now, present day, processors and compilers are much more sophisticated and these rules don't apply so easily. It is also harder to predict what the performance impact of your choice will be on some unknown future platform or compiler ... How do we really know that uint64_t for example will perform better or worse than uint32_t on any particular future target? Unless you're a processor/compiler guru, you don't...

So... maybe it's old fashioned, but unless I am writing code for a constrained environment like Arduino, etc. I still use int for general purpose values that I know will be within int size on all reasonable targets for the application I am writing. And the compiler takes it from there... These days that generally means 32 bits signed. Even if one assumes that 16 bits is the minimum integer size, it covers most use cases.. and the use cases for numbers larger than that are easily identified and handled with appropriate types.

Solution 3 - C++

Most programs do not live and die on the edge of a few CPU cycles, and int is very easy to write. However, if you are performance-sensitive, I suggest using the fixed-width integer types defined in <cstdint>, such as int32_t or uint64_t. These have the benefit of being very clear in their intended behavior in regards to being signed or unsigned, as well as their size in memory. This header also includes the fast variants such as int_fast32_t, which are at least the stated size, but might be more, if it helps performance.

Solution 4 - C++

No formal reason to use int. It doesn't correspond to anything sane as per standard. For indices you almost always want signed pointer-sized integer.

That said, typing int feels like you just said hey to Ritchie and typing std::ptrdiff_t feels like Stroustrup just kicked you in the butt. Coders are people too, don't bring too much ugliness into their life. I would prefer to use long or some easily typed typedef like index instead of std::ptrdiff_t.

Solution 5 - C++

This is somewhat opinion-based, but alas, the question somewhat begs for it, too.

First of all, you talk about integers and indices as if they were the same thing, which is not the case. For any such thing as "integer of sorts, not sure what size", simply using int is of course, most of the time, still appropriate. This works fine most of the time, for most applications, and the compiler is comfortable with it. As a default, that's fine.

For array indices, it's a different story.

There is to date one single formally correct thing, and that's std::size_t. In the future, there may be a std::index_t which makes the intent clearer on the source level, but so far there is not.
std::ptrdiff_t as an index "works" but is just as incorrect as int since it allows for negative indices.
Yes, this happens what Mr. Sutter deems correct, but I beg to differ. Yes, on an assembly language instruction level, this is supported just fine, but I still object. The standard says:

>8.3.4/6: E1[E2] is identical to *((E1)+(E2)) [...] Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1.
>5.7/5: [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object [...] otherwise, the behavior is undefined.

An array subscription refers to the E2-th member of E1. There is no such thing as a negative-th element of an array. But more importantly, the pointer arithmetic with a negative additive expression invokes undefined behavior.

In other words: signed indices of whatever size are a wrong choice. Indices are unsigned. Yes, signed indices work, but they're still wrong.

Now, although size_t is by definition the correct choice (an unsigned integer type that is large enough to contain the size of any object), it may be debatable whether it is truly good choice for the average case, or as a default.

Be honest, when was the last time you created an array with 1019 elements?

I am personally using unsigned int as a default because the 4 billion elements that this allows for is way enough for (almost) every application, and it already pushes the average user's computer rather close to its limit (if merely subscribing an array of integers, that assumes 16GB of contiguous memory allocated). I personally deem defaulting to 64-bit indices as ridiculous.

If you are programming a relational database or a filesystem, then yes, you will need 64-bit indices. But for the average "normal" program, 32-bit indices are just good enough, and they only consume half as much storage.

When keeping around considerably more than a handful of indices, and if I can afford (because arrays are not larger than 64k elements), I even go down to uint16_t. No, I'm not joking there.

Is storage really such a problem? It's ridiculous to greed about two or four bytes saved, isn't it! Well, no...

Size can be a problem for pointers, so sure enough it can be for indices as well. The x32 ABI does not exist for no reason. You will not notice the overhead of needlessly large indices if you have only a handful of them in total (just like pointers, they will be in registers anyway, nobody will notice whether they're 4 or 8 bytes in size).

But think for example of a slot map where you store an index for every element (depending on the implementation, two indices per element). Oh heck, it sure does make a bummer of a difference whether you hit L2 every time, or whether you have a cache miss on every access! Bigger is not always better.

At the end of the day, you must ask yourself what you pay for, and what you get in return. With that in mind, my style recommendation would be:

If it costs you "nothing" because you only have e.g. one pointer and a few indices to keep around, then just use what's formally correct (that'd be size_t). Formally correct is good, correct always works, it's readable and intellegible, and correct is... never wrong.

If, however, it does cost you (you have maybe several hundred or thousand or ten thousand indices), and what you get back is worth nothing (because e.g. you cannot even store 220 elements, so whether you could subscribe 232 or 264 makes no difference), you should think twice about being too wasteful.

Solution 6 - C++

On most modern 64-bit architectures, int is 4 bytes and ptrdiff_t is 8 bytes. If your program uses a lot of integers, using ptrdiff_t instead of int could double your program's memory requirement.

Also consider that modern CPUs are frequently bottlenecked by memory performance. Using 8-byte integers also means your CPU cache now has half as many elements as before, so now it must wait for the slow main memory more often (which can easily take several hundred cycles).

In many cases, the cost of executing "32-to-64-bit conversion" operations is completely dwarfed by memory performance.

So this is a practical reason int is still popular on 64-bit machines.

  • Now you may argue about two dozen different integer types and portability and standard committees and everything, but the truth is that for a lot of C++ programs written out there, there's a "canonical" architecture they're thinking of, which is frequently the only architecture they're ever concerned about. (If you're writing a 3D graphics routine for a Windows game, you're sure it won't run on an IBM mainframe.) So for them, the question boils down to: "Do I need a 4-byte integer or an 8-byte one here?"

Solution 7 - C++

My advice to you is not to look at assembly language output too much, not to worry too much about exactly what size each variable is, and not to say things like "the compiler feels at home with". (I truly don't know what you mean by that last one.)

For garden-variety integers, the ones that most programs are full of, plain int is supposed to be a good type to use. It's supposed to be the natural word size of the machine. It's supposed to be efficient to use, neither wasting unnecessary memory nor inducing lots of extra conversions when moving between memory and computation registers.

Now, it's true that there are plenty of more specialized uses for which plain int is no longer appropriate. In particular, sizes of objects, counts of elements, and indices into arrays are almost always size_t. But that doesn't mean all integers should be size_t!

It's also true that mixtures of signed and unsigned types, and mixtures of different-size types, can cause problems. But most of those are well taken care of by modern compilers and the warnings they emit for unsafe combinations. So as long as you're using a modern compiler and paying attention to its warnings, you don't need to pick an unnatural type just to try to avoid type mismatch problems.

Solution 8 - C++

I don't think that there's real reason for using int.

How to choose the integer type?

  • If it is for bit operations, you can use an unsigned type, otherwise use a signed one
  • If it is for memory-related thing (index, container size, etc.), for which you don't know the upper bound, use std::ptrdiff_t (the only problem is when size is larger than PTRDIFF_MAX, which is rare in practice)
  • Otherwise use intXX_t or int(_least)/(_fast)XX_t.

These rules cover all the possible usages for int, and they give a better solution:

  • int is not good for storing memory related things, as its range can be smaller than an index can be (this is not a theoretical thing: for 64-bit machines, int is usually 32-bit, so with int, you can only handle 2 billion elements)
  • int is not good for storing "general" integers, as its range may be smaller than needed (undefined behavior happens if range is not enough), or on the contrary, its range may be much larger than needed (so memory is wasted)

The only reason one could use an int, if one does a calculation, and knows that the range fit into [-32767;32767] (the standard only guarantees this range. Note however, that implementations are free to provide bigger sized ints, and they usually do so. Currently int is 32-bit on a lot of platforms).

As the mentioned std types are a little bit tedious to write, one could typedef them to be shorter (I use s8/u8/.../s64/u64, and spt/upt ("(un)signed pointer sized type") for ptrdiff_t/size_t. I've been using these typedefs for 15 years, and I've never written a single int since...).

Solution 9 - C++

Pro

Easier to type, I guess? But you can always typedef.

Many APIs use int, including parts of the standard library. This has historically caused problems, for example during the transition to 64-bit file sizes.

Because of the default type promotion rules, types narrower than int could be widened to int or unsigned int unless you add explicit casts in a lot of places, and a lot of different types could be narrower than int on some implementation somewhere. So, if you care about portability, it’s a minor headache.

Con

I also use ptrdiff_t for indices, most of the time. (I agree with Google that unsigned indices are a bug attractor.) For other kinds of math, there’s int_fast64_t. int_fast32_t, and so on, which will also be as good as or better than int. Almost no real-world systems, with the exception of a few defunct Unices from last century, use ILP64, but there are plenty of CPUs where you would want 64-bit math. And a compiler is technically allowed, by standard, to break your program if your int is greater than 32,767.

That said, any C compiler worth its salt will be tested on a lot of code that adds an int to a pointer within an inner loop. So it can’t do anything too dumb. Worst-case scenario on present-day hardware is that it needs an extra instruction to sign-extend a 32-bit signed value to 64 bits. But, if what you really want is the fastest pointer math, the fastest math for values with magnitude between 32 kibi and 2 gibi, or the least wasted memoey, you should say what you mean, not make the compiler guess.

Solution 10 - C++

I guess 99% of cases there is no reason to use int(or signed integer of other sizes). However, there are still situations, when using int is a good option.


A) Performance:

One difference between int and size_t is that i++ can be undefined behavior for int - if i is MAX_INT. This actually might be a good thing because compiler could use this undefined behavior to speed things up.

For example in this question the difference was about factor 2 between exploiting the undefined behavior and using compiler flag -fwrapv which prohibits this exploit.

If my working-horse-for-loop becomes twice as fast by using ints - sure I will use it


B) Less error prone code

Reversed for-loops with size_t look strange and is a source for errors (I hope I got it right):

for(size_t i = N-1; i < N; i--){...}

By using

for(int i = N-1; i >= 0; i--){...}

you will deserve the gratitude of less experienced C++-programmers, who will have to manage your code some day.


C) Design using signed indices

By using int as indices you one could signal wrong values/out of range with negative values, something that comes handy and can lead to a clearer code.

  1. "find index of an element in array" could return -1 if element is not present. For detecting this "error" you don't have to know the size of the array.

  2. binary search could return positive index if element is in the array, and -index for the position where the element would be inserted into array (and is not in the array).

Clearly, the same information could be encoded with positive index-values, but the code becomes somewhat less intuitive.


Clearly, there are also reasons to choose int over std::ptrdiff_t - one of them is memory bandwidth. There are a lot of memory-bound algorithms, for them it is important to reduce the amount of memory transfered from RAM to cache.

If you know, that all numbers are less then 2^31 that would be an advantage to use int because otherwise a half of memory transfer would be writing only 0 of which you already know, that they are there.

An example are compressed sparse row (crs) matrices - their indices are stored as ints and not long long. Because many operations with sparse matrices are memory bound, there is really a different between using 32 or 64 bits.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionInsideLoopView Question on Stackoverflow
Solution 1 - C++Robert AndrzejukView Answer on Stackoverflow
Solution 2 - C++little_birdieView Answer on Stackoverflow
Solution 3 - C++Eyal K.View Answer on Stackoverflow
Solution 4 - C++UprootedView Answer on Stackoverflow
Solution 5 - C++DamonView Answer on Stackoverflow
Solution 6 - C++jickView Answer on Stackoverflow
Solution 7 - C++Steve SummitView Answer on Stackoverflow
Solution 8 - C++gezaView Answer on Stackoverflow
Solution 9 - C++DavislorView Answer on Stackoverflow
Solution 10 - C++eadView Answer on Stackoverflow