Why would uint32_t be preferred rather than uint_fast32_t?

C++ Problem Overview

It seems that uint32_t is much more prevalent than uint_fast32_t (I realise this is anecdotal evidence). That seems counter-intuitive to me, though.

Almost always when I see an implementation use uint32_t, all it really wants is an integer that can hold values up to 4,294,967,295 (usually a much lower bound somewhere between 65,535 and 4,294,967,295).

It seems weird to then use uint32_t, as the 'exactly 32 bits' guarantee is not needed, and the 'fastest available >= 32 bits' guarantee of uint_fast32_t seem to be exactly the right idea. Moreover, while it's usually implemented, uint32_t is not actually guaranteed to exist.

Why, then, would uint32_t be preferred? Is it simply better known or are there technical advantages over the other?

C++ Solutions

Solution 1 - C++

uint32_t is guaranteed to have nearly the same properties on any platform that supports it.¹

uint_fast32_t has very little guarantees about how it behaves on different systems in comparison.

If you switch to a platform where uint_fast32_t has a different size, all code that uses uint_fast32_t has to be retested and validated. All stability assumptions are going to be out the window. The entire system is going to work differently.

When writing your code, you may not even have access to a uint_fast32_t system that isn't 32 bits in size.

uint32_t won't work differently (see footnote).

Correctness is more important than speed. Premature correctness is thus a better plan than premature optimization.

In the event I was writing code for systems where uint_fast32_t was 64 or more bits, I might test my code for both cases and use it. Barring both need and opportunity, doing so is a bad plan.

Finally, uint_fast32_t when you are storing it for any length of time or number of instances can be slower than uint32 simply due to cache size issues and memory bandwidth. Todays computers are far more often memory-bound than CPU bound, and uint_fast32_t could be faster in isolation but not after you account for memory overhead.

¹ As @chux has noted in a comment, if unsigned is larger than uint32_t, arithmetic on uint32_t goes through the usual integer promotions, and if not, it stays as uint32_t. This can cause bugs. Nothing is ever perfect.

Solution 2 - C++

> Why do many people use uint32_t rather than uint32_fast_t?

Note: Mis-named uint32_fast_t should be uint_fast32_t.

uint32_t has a tighter specification than uint_fast32_t and so makes for more consistent functionality.

uint32_t pros:

Various algorithms specify this type. IMO - best reason to use.
Exact width and range known.
Arrays of this type incur no waste.
unsigned integer math with its overflow is more predictable.
Closer match in range and math of other languages' 32-bit types.
Never padded.

uint32_t cons:

Not always available (yet this is rare in 2018).
E.g.: Platforms lacking 8/16/32-bit integers (9/18/36-bit, others).
E.g.: Platforms using non-2's complement. old 2200

uint_fast32_t pros:

Always available.
This always allow all platforms, new and old, to use fast/minimum types.
"Fastest" type that support 32-bit range.

uint_fast32_t cons:

Range is only minimally known. Example, it could be a 64-bit type.
Arrays of this type may be wasteful in memory.
All answers (mine too at first), the post and comments used the wrong name uint32_fast_t. Looks like many just don't need and use this type. We didn't even use the right name!
Padding possible - (rare).
In select cases, the "fastest" type may really be another type. So uint_fast32_t is only a 1st order approximation.

In the end, what is best depends on the coding goal. Unless coding for very wide portability or some niched performance function, use uint32_t.

There is another issue when using these types that comes into play: their rank compared to int/unsigned

Presumably uint_fastN_t could be the rank of unsigned. This is not specified, but a certain and testable condition.

Thus, uintN_t is more likely than uint_fastN_t to be narrower the unsigned. This means that code that uses uintN_t math is more likely subject to integer promotions than uint_fastN_t when concerning portability.

With this concern: portability advantage uint_fastN_t with select math operations.

Side note about int32_t rather than int_fast32_t: On rare machines, INT_FAST32_MIN may be -2,147,483,647 and not -2,147,483,648. The larger point: (u)intN_t types are tightly specified and lead to portable code.

Solution 3 - C++

> Why do many people use uint32_t rather than uint32_fast_t?

Silly answer:

There is no standard type uint32_fast_t, the correct spelling is uint_fast32_t.

Practical answer:

Many people actually use uint32_t or int32_t for their precise semantics, exactly 32 bits with unsigned wrap around arithmetic (uint32_t) or 2's complement representation (int32_t). The xxx_fast32_t types may be larger and thus inappropriate to store to binary files, use in packed arrays and structures, or send over a network. Furthermore, they may not even be faster.

Pragmatic answer:

Many people just don't know (or simply don't care) about uint_fast32_t, as demonstrated in comments and answers, and probably assume plain unsigned int to have the same semantics, although many current architectures still have 16-bit ints and some rare Museum samples have other strange int sizes less than 32.

UX answer:

Although possibly faster than uint32_t, uint_fast32_t is slower to use: it takes longer to type, especially accounting for looking up spelling and semantics in the C documentation ;-)

Elegance matters, (obviously opinion based):

uint32_t looks bad enough that many programmers prefer to define their own u32 or uint32 type... From this perspective, uint_fast32_t looks clumsy beyond repair. No surprise it sits on the bench with its friends uint_least32_t and such.

Solution 4 - C++

One reason is that unsigned int is already "fastest" without the need for any special typedefs or the need to include something. So, if you need it fast, just use the fundamental int or unsigned int type.
While the standard does not explicitly guarantee that it is fastest, it indirectly does so by stating "Plain ints have the natural size suggested by the architecture of the execution environment" in 3.9.1. In other words, int (or its unsigned counterpart) is what the processor is most comfortable with.

Now of course, you don't know what size unsigned int might be. You only know it is at least as large as short (and I seem to remember that short must be at least 16 bits, although I can't find that in the standard now!). Usually it's just plain simply 4 bytes, but it could in theory be larger, or in extreme cases, even smaller (~~although I've personally never encountered an architecture where this was the case, not even on 8-bit computers in the 1980s... maybe some microcontrollers, who knows~~ turns out I suffer from dementia, int was very clearly 16 bits back then).

The C++ standard doesn't bother to specify what the <cstdint> types are or what they guarantee, it merely mentions "same as in C".

uint32_t, per the C standard, guarantees that you get exactly 32 bits. Not anything different, none less and no padding bits. Sometimes this is exactly what you need, and thus it is very valuable.

uint_least32_t guarantees that whatever the size is, it cannot be smaller than 32 bits (but it could very well be larger). Sometimes, but much more rarely than an exact witdh or "don't care", this is what you want.

Lastly, uint_fast32_t is somewhat superfluous in my opinion, except for documentation-of-intent purposes. The C standard states "designates an integer type that is usually fastest" (note the word "usually") and explicitly mentions that it needs not be fastest for all purposes. In other words, uint_fast32_t is just about the same as uint_least32_t, which is usually fastest too, only no guarantee given (but no guarantee either way).

Since most of the time you either don't care about the exact size or you want exactly 32 (or 64, sometimes 16) bits, and since the "don't care" unsigned int type is fastest anyway, this explains why uint_fast32_t isn't so frequently used.

Solution 5 - C++

I have not seen evidence that uint32_t be used for its range. Instead, most of the time that I've seen uint32_t is used, it is to hold exactly 4 octets of data in various algorithms, with guaranteed wraparound and shift semantics!

There are also other reasons to use uint32_t instead of uint_fast32_t: Often it is that it will provide stable ABI. Additionally the memory usage can be known accurately. This very much offsets whatever the speed gain would be from uint_fast32_t, whenever that type would be distinct from that of uint32_t.

For values < 65536, there is already a handy type, it is called unsigned int (unsigned short is required to have at least that range as well, but unsigned int is of the native word size) For values < 4294967296, there is another called unsigned long.

And lastly, people do not use uint_fast32_t because it is annoyingly long to type and easy to mistype :D

Solution 6 - C++

Several reasons.

Many people don't know the 'fast' types exist.
It's more verbose to type.
It's harder to reason about your programs behaviour when you don't know the actual size of the type.
The standard doesn't actually pin down fastest, nor can it really what type is actually fastest can be very context dependent.
I have seen no evidence of platform developers putting any thought into the size of these types when defining their platforms. For example on x86-64 Linux the "fast" types are all 64-bit even though x86-64 has hardware support for fast operations on 32-bit values.

In summary the "fast" types are worthless garbage. If you really need to figure out what type is fastest for a given application you need to benchmark your code on your compiler.

Solution 7 - C++

From the viewpoint of correctness and ease of coding, uint32_t has many advantages over uint_fast32_t in particular because of the more precisely defined size and arithmetic semantics, as many users above have pointed out.

What has perhaps been missed is that the one supposed advantage of uint_fast32_t - that it can be faster, just never materialized in any meaningful way. Most of the 64-bit processors that have dominated the 64-bit era (x86-64 and Aarch64 mostly) evolved from 32-bit architectures and have fast 32-bit native operations even in 64-bit mode. So uint_fast32_t is just the same as uint32_t on those platforms.

Even if some of the "also ran" platforms like POWER, MIPS64, SPARC only offer 64-bit ALU operations, the vast majority of interesting 32-bit operations can be done just fine on 64-bit registers: the bottom 32-bit will have the desired results (and all mainstream platforms at least allow you to load/store 32-bits). Left shift is the main problematic one, but even that can be optimized in many cases by value/range tracking optimizations in the compiler.

I doubt the occasional slightly slower left shift or 32x32 -> 64 multiplication is going to outweigh double the memory use for such values, in all but the most obscure applications.

Finally, I'll note that while the tradeoff has largely been characterized as "memory use and vectorization potential" (in favor of uint32_t) versus instruction count/speed (in favor of uint_fast32_t) - even that isn't clear to me. Yes, on some platforms you'll need additional instructions for some 32-bit operations, but you'll also save some instructions because:

Using a smaller type often allows the compiler to cleverly combine adjacent operations by using one 64-bit operation to accomplish two 32-bit ones. An example of this type of "poor man's vectorization" is not uncommon. For example, create of a constant struct two32{ uint32_t a, b; } into rax like two32{1, 2} can be optimized into a single mov rax, 0x20001 while the 64-bit version needs two instructions. In principle this should also be possible for adjacent arithmetic operations (same operation, different operand), but I haven't seen it in practice.
Lower "memory use" also often leads to fewer instructions, even if memory or cache footprint isn't a problem, because any type structure or arrays of this type are copied, you get twice the bang for your buck per register copied.
Smaller data types often exploit better modern calling conventions like the SysV ABI which pack data structure data efficiently into registers. For example, you can return up to a 16-byte structure in registers rdx:rax. For a function returning structure with 4 uint32_t values (initialized from a constant), that translates into
```
 ret_constant32():
     movabs  rax, 8589934593
     movabs  rdx, 17179869187
     ret
```

The same structure with 4 64-bit uint_fast32_t needs a register move and four stores to memory to do the same thing (and the caller will probablyhave to read the values back from memory after the return):

    ret_constant64():
        mov     rax, rdi
        mov     QWORD PTR [rdi], 1
        mov     QWORD PTR [rdi+8], 2
        mov     QWORD PTR [rdi+16], 3
        mov     QWORD PTR [rdi+24], 4
        ret

Similarly, when passing structure arguments, 32-bit values are packed about twice as densely into the registers available for parameters, so it makes it less likely that you'll run out of register arguments and have to spill to the stack¹.

Even if you choose to use uint_fast32_t for places where "speed matters" you'll often also have places where you need a fixed size type. For example, when passing values for external output, from external input, as part of your ABI, as part of a structure that needs a specific layout, or because you smartly use uint32_t for large aggregations of values to save on memory footprint. In the places where your uint_fast32_t and ``uint32_t` types need to interface, you might find (in addition to the development complexity), unnecessary sign extensions or other size-mismatch related code. Compilers do an OK job at optimizing this away in many cases, but it still not unusual to see this in optimized output when mixing types of different sizes.

You can play with some of the examples above and more on godbolt.

¹ To be clear, the convention of packing structures tightly into registers isn't always a clear win for smaller values. It does mean that the smaller values may have to be "extracted" before they can be used. For example a simple function that returns the sum of the two structure members together needs a mov rax, rdi; shr rax, 32; add edi, eax while for the 64-bit version each argument gets its own register and just needs a single add or lea. Still if you accept that the "tightly pack structures while passing" design makes sense overall, then smaller values will take more advantage of this feature.

Solution 8 - C++

To my understanding, int was initially supposed to be a "native" integer type with additional guarantee that it should be at least 16 bits in size - something that was considered "reasonable" size back then.

When 32-bit platforms became more common, we can say that "reasonable" size has changed to 32 bits:

Modern Windows uses 32-bit int on all platforms.
POSIX guarantees that int is at least 32 bits.
C#, Java has type int which is guaranteed to be exactly 32 bits.

But when 64-bit platform became the norm, no one expanded int to be a 64-bit integer because of:

Portability: a lot of code depends on int being 32 bit in size.
Memory consumption: doubling memory usage for every int might be unreasonable for most cases, as in most cases numbers in use are much smaller than 2 billion.

Now, why would you prefer uint32_t to uint_fast32_t? For the same reason languages, C# and Java always use fixed size integers: programmer does not write code thinking about possible sizes of different types, they write for one platform and test code on that platform. Most of the code implicitly depends on specific sizes of data types. And this is why uint32_t is a better choice for most cases - it does not allow any ambiguity regarding its behavior.

Moreover, is uint_fast32_t really the fastest type on a platform with a size equal or greater to 32 bits? Not really. Consider this code compiler by GCC for x86_64 on Windows:

extern uint64_t get(void);

uint64_t sum(uint64_t value)
{
    return value + get();
}

Generated assembly looks like this:

push   %rbx
sub    $0x20,%rsp
mov    %rcx,%rbx
callq  d <sum+0xd>
add    %rbx,%rax
add    $0x20,%rsp
pop    %rbx
retq

Now if you change get()'s return value to uint_fast32_t (which is 4 bytes on Windows x86_64) you get this:

push   %rbx
sub    $0x20,%rsp
mov    %rcx,%rbx
callq  d <sum+0xd>
mov    %eax,%eax        ; <-- additional instruction
add    %rbx,%rax
add    $0x20,%rsp
pop    %rbx
retq

Notice how generated code is almost the same except for additional mov %eax,%eax instruction after function call which is meant to expand 32-bit value into a 64-bit value.

There is no such issue if you only use 32-bit values, but you will probably be using those with size_t variables (array sizes probably?) and those are 64 bits on x86_64. On Linux uint_fast32_t is 8 bytes, so the situation is different.

Many programmers use int when they need to return small value (let's say in the range [-32,32]). This would work perfectly if int would be platforms native integer size, but since it is not on 64-bit platforms, another type which matches platform native type is a better choice (unless it is frequently used with other integers of smaller size).

Basically, regardless of what standard says, uint_fast32_t is broken on some implementations anyway. If you care about additional instruction generated in some places, you should define your own "native" integer type. Or you can use size_t for this purpose, as it will usually match native size (I am not including old and obscure platforms like 8086, only platforms that can run Windows, Linux etc).

Another sign that shows int was supposed to be a native integer type is "integer promotion rule". Most CPUs can only perform operations on native, so 32 bit CPU usually can only do 32-bit additions, subtractions etc (Intel CPUs are an exception here). Integer types of other sizes are supported only through load and store instructions. For example, the 8-bit value should be loaded with appropriate "load 8-bit signed" or "load 8-bit unsigned" instruction and will expand value to 32 bits after load. Without integer promotion rule C compilers would have to add a little bit more code for expressions that use types smaller than native type. Unfortunately, this does not hold anymore with 64-bit architectures as compilers now have to emit additional instructions in some cases (as was shown above).

Solution 9 - C++

For practical purposes, uint_fast32_t is completely useless. It's defined incorrectly on the most widespread platform (x86_64), and doesn't really offer any advantages anywhere unless you have a very low-quality compiler. Conceptually, it never makes sense to use the "fast" types in data structures/arrays - any savings you get from the type being more efficient to operate on will be dwarfed by the cost (cache misses, etc.) of increasing the size of your working data set. And for individual local variables (loop counters, temps, etc.) a non-toy compiler can usually just work with a larger type in the generated code if that's more efficient, and only truncate to the nominal size when necessary for correctness (and with signed types, it's never necessary).

The one variant that is theoretically useful is uint_least32_t, for when you need to be able to store any 32-bit value, but want to be portable to machines that lack an exact-size 32-bit type. Practically, speaking, however, that's not something you need to worry about.

Solution 10 - C++

In many cases, when an algorithm works on an array of data, the best way to improve performance is to minimize the number of cache misses. The smaller each element, the more of them can fit into the cache. This is why a lot of code is still written to use 32-bit pointers on 64-bit machines: they don’t need anything close to 4 GiB of data, but the cost of making all pointers and offsets need eight bytes instead of four would be substantial.

There are also some ABIs and protocols specified to need exactly 32 bits, for example, IPv4 addresses. That’s what uint32_t really means: use exactly 32 bits, regardless of whether that’s efficient on the CPU or not. These used to be declared as long or unsigned long, which caused a lot of problems during the 64-bit transition. If you just need an unsigned type that holds numbers up to at least 2³²-1, that’s been the definition of unsigned long since the first C standard came out. In practice, though, enough old code assumed that a long could hold any pointer or file offset or timestamp, and enough old code assumed that it was exactly 32 bits wide, that compilers can’t necessarily make long the same as int_fast32_t without breaking too much stuff.

In theory, it would be more future-proof for a program to use uint_least32_t, and maybe even load uint_least32_t elements into a uint_fast32_t variable for calculations. An implementation that had no uint32_t type at all could even declare itself in formal compliance with the standard! (It just wouldn’t be able to compile many existing programs.) In practice, there’s no architecture any more where int, uint32_t, and uint_least32_t are not the same, and no advantage, currently, to the performance of uint_fast32_t. So why overcomplicate things?

Yet look at the reason all the 32_t types needed to exist when we already had long, and you’ll see that those assumptions have blown up in our faces before. Your code might well end up running someday on a machine where exact-width 32-bit calculations are slower than the native word size, and you would have been better off using uint_least32_t for storage and uint_fast32_t for calculation religiously. Or if you’ll cross that bridge when you get to it and just want something simple, there’s unsigned long.

Solution 11 - C++

To give a direct answer: I think the real reason why uint32_t is used over uint_fast32_t or uint_least32_t is simply that it is easier to type, and, due to being shorter, much nicer to read: If you make structs with some types, and some of them are uint_fast32_t or similar, then it's often hard to align them nicely with int or bool or other types in C, which are quite short (case in point: char vs. character). I of course cannot back this up with hard data, but the other answers can only guess at the reason as well.

As for technical reasons to prefer uint32_t, I don't think there are any - when you absolutely need an exact 32 bit unsigned int, then this type is your only standardised choice. In almost all other cases, the other variants are technically preferable - specifically, uint_fast32_t if you are concerned about speed, and uint_least32_t if you are concerned about storage space. Using uint32_t in either of these cases risks not being able to compile as the type is not required to exist.

In practise, the uint32_t and related types exist on all current platforms, except some very rare (nowadays) DSPs or joke implementations, so there is little actual risk in using the exact type. Similarly, while you can run into speed penalties with the fixed-width types, they are (on modern cpus) not crippling anymore.

Which is why, I think, the shorter type simply wins out in most cases, due to programmer lazyness.

Content Type	Original Author	Original Content on Stackoverflow
Question	Joost	View Question on Stackoverflow
Solution 1 - C++	Yakk - Adam Nevraumont	View Answer on Stackoverflow
Solution 2 - C++	chux - Reinstate Monica	View Answer on Stackoverflow
Solution 3 - C++	chqrlie	View Answer on Stackoverflow
Solution 4 - C++	Damon	View Answer on Stackoverflow
Solution 5 - C++	Antti Haapala -- Слава Україні	View Answer on Stackoverflow
Solution 6 - C++	plugwash	View Answer on Stackoverflow
Solution 7 - C++	BeeOnRope	View Answer on Stackoverflow
Solution 8 - C++	StaceyGirl	View Answer on Stackoverflow
Solution 9 - C++	R.. GitHub STOP HELPING ICE	View Answer on Stackoverflow
Solution 10 - C++	Davislor	View Answer on Stackoverflow
Solution 11 - C++	Remember Monica	View Answer on Stackoverflow