Will a `char` always-always-always have 8 bits?

CMemory

C Problem Overview


I've always assumed:

  1. that a char is represented by a byte,
  2. that a byte can always be counted upon to have 8 bits,
  3. that sizeof (char) is always 1,
  4. and that the maximum theoretical amount of memory I can allocate (counted in chars) is the number of bytes of RAM (+ swap space).

But now that I've read the Wikipedia entry on the byte I'm not so sure anymore.

Which one(s) of my assumptions is wrong? Which one(s) is dangerous?

C Solutions


Solution 1 - C

  1. Yes, char and byte are pretty much the same. A byte is the smallest addressable amount of memory, and so is a char in C. char always has size 1.

From the spec, section 3.6 byte: >byte > addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

And section 3.7.1 character: >character > >single-byte character
><C> bit representation that fits in a byte

  1. A char has CHAR_BIT bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bit char types, though. CHAR_BIT is defined in limits.h.

From the spec, section 5.2.4.2.1 Sizes of integer types <limits.h>: >The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. > >— number of bits for smallest object that is not a bit-field (byte)
>    CHAR_BIT                               8

  1. sizeof(char) == 1. Always.

From the spec, section 6.5.3.4 The sizeof operator, paragraph 3: >When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

  1. You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.

Here's the complete spec section 7.20.3.3 The malloc function: >Synopsis > >1 #include <stdlib.h>
>   void *malloc(size_t size); > >Description > 2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate. > >Returns > >3 The malloc function returns either a null pointer or a pointer to the allocated space.

That's the entirety of the specification, so there's not really any limit you can rely on.

Solution 2 - C

sizeof(char) is always 1 byte. A byte is not always one octet, however: The Texas Instruments TI C55x, for example, is a DSP with a 16-bit byte.

Solution 3 - C

sizeof(char) is defined to always be 1. From C99:

>When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

It is not however guaranteed to be 8 bits. In practice, on the vast majority of platforms out there, it will be, but no, you cannot technically count on that to always be the case (nor should it matter as you should be using sizeof anyway).

Solution 4 - C

Concretely, some architectures, especially in the DSP field have char:s larger than 8 bits. In practice, they sacrifice memory space for speed.

Solution 5 - C

> Traditionally, a byte is not necessarily 8 bits, but merely a smallish > region of memory, usually suitable for storing one character. The C > Standard follows this usage, so the bytes used by malloc and sizeof > can be more than 8 bits. [footnote] (The Standard does not allow them > to be less.)

But sizeof(char) is always 1.

Memorizing the C FAQ is a career-enhancing move.

Solution 6 - C

In C, a char is always one byte, so your first and third assumptions are correct.

A byte is not always 8 bits, though, so your second assumption doesn't always hold. That said, >= 99.99% of all systems in existence today have 8-bit characters, so lots of code implicitly assumes 8-bit characters and runs just fine on all the target platforms. Certainly Windows and Mac machines always use 8-bit characters, and AFAIK Linux does as well (Linux has been ported to so many platforms that I'm not 100% sure that somebody hasn't ported Linux to a platform where 9-bit characters make sense).

The maximum amount of memory that can be allocated is the size of virtual memory, minus space reserved for the operating system.

Solution 7 - C

The unfortunate thing (or maybe fortunate, depending on how you view things) is that the idea of what a byte is commonly thought as (8 bits) is not synonymous with what the C programming language considers a byte to be. Looking at some of the previous answers, a byte has an exact definition when it comes to the C programming language and nowhere in the definition does it mention a byte being 8 bits. It simply mentions that a byte is

> "an addressable unit of data storage large enough to hold any member of > the basic character set of the execution environment."

So to answer your question of, “Will a char always-always-always have 8 bits”, the answer is, not always, but most often it will. If you are interested in finding out just exactly how many bits of space your data types consume on your system, you can use the following line of code:

sizeof(type) * CHAR_BIT

Where, type is your data type. For example, to find out how many bits a char takes up on your system, you can use the following:

printf("The number of bits a 'char' has on my system: %zu\n", sizeof(char) * CHAR_BIT);

This is taken from the GNU C Library Reference Manual, which contains the following illuminating explanation on this topic:

> There is no operator in the C language that can give you the number of > bits in an integer data type. But you can compute it from the macro > CHAR_BIT, defined in the header file limits.h. CHAR_BIT — This is the > number of bits in a char—eight, on most systems. The value has type > int. You can compute the number of bits in any data type type like > this:

> sizeof (type) * CHAR_BIT That expression includes padding bits as well as value and sign bits.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionlindelofView Question on Stackoverflow
Solution 1 - CCarl NorumView Answer on Stackoverflow
Solution 2 - CMichael FoukarakisView Answer on Stackoverflow
Solution 3 - CEd S.View Answer on Stackoverflow
Solution 4 - CLindydancerView Answer on Stackoverflow
Solution 5 - CMike Sherrill 'Cat Recall'View Answer on Stackoverflow
Solution 6 - CAdam MihalcinView Answer on Stackoverflow
Solution 7 - CAdam BakView Answer on Stackoverflow