When is it worthwhile to use bit fields?

C++CBit Fields

C++ Problem Overview


Is it worthwhile using C's bit-field implementation? If so, when is it ever used?

I was looking through some emulator code and it looks like the registers for the chips are not being implemented using bit fields.

Is this something that is avoided for performance reasons (or some other reason)?

Are there still times when bit-fields are used? (ie firmware to put on actual chips, etc)

C++ Solutions


Solution 1 - C++

Bit-fields are typically only used when there's a need to map structure fields to specific bit slices, where some hardware will be interpreting the raw bits. An example might be assembling an IP packet header. I can't see a compelling reason for an emulator to model a register using bit-fields, as it's never going to touch real hardware!

Whilst bit-fields can lead to neat syntax, they're pretty platform-dependent, and therefore non-portable. A more portable, but yet more verbose, approach is to use direct bitwise manipulation, using shifts and bit-masks.

If you use bit-fields for anything other than assembling (or disassembling) structures at some physical interface, performance may suffer. This is because every time you read or write from a bit-field, the compiler will have to generate code to do the masking and shifting, which will burn cycles.

Solution 2 - C++

One use for bitfields which hasn't yet been mentioned is that unsigned bitfields provide arithmetic modulo a power-of-two "for free". For example, given:

struct { unsigned x:10; } foo;

arithmetic on foo.x will be performed modulo 210 = 1024.

(The same can be achieved directly by using bitwise & operations, of course - but sometimes it might lead to clearer code to have the compiler do it for you).

Solution 3 - C++

FWIW, and looking only at the relative performance question - a bodgy benchmark:

#include <time.h>
#include <iostream>

struct A
{
    void a(unsigned n) { a_ = n; }
    void b(unsigned n) { b_ = n; }
    void c(unsigned n) { c_ = n; }
    void d(unsigned n) { d_ = n; }
    unsigned a() { return a_; }
    unsigned b() { return b_; }
    unsigned c() { return c_; }
    unsigned d() { return d_; }
    volatile unsigned a_:1,
                      b_:5,
                      c_:2,
                      d_:8;
};

struct B
{
    void a(unsigned n) { a_ = n; }
    void b(unsigned n) { b_ = n; }
    void c(unsigned n) { c_ = n; }
    void d(unsigned n) { d_ = n; }
    unsigned a() { return a_; }
    unsigned b() { return b_; }
    unsigned c() { return c_; }
    unsigned d() { return d_; }
    volatile unsigned a_, b_, c_, d_;
};

struct C
{
    void a(unsigned n) { x_ &= ~0x01; x_ |= n; }
    void b(unsigned n) { x_ &= ~0x3E; x_ |= n << 1; }
    void c(unsigned n) { x_ &= ~0xC0; x_ |= n << 6; }
    void d(unsigned n) { x_ &= ~0xFF00; x_ |= n << 8; }
    unsigned a() const { return x_ & 0x01; }
    unsigned b() const { return (x_ & 0x3E) >> 1; }
    unsigned c() const { return (x_ & 0xC0) >> 6; }
    unsigned d() const { return (x_ & 0xFF00) >> 8; }
    volatile unsigned x_;
};

struct Timer
{
    Timer() { get(&start_tp); }
    double elapsed() const {
        struct timespec end_tp;
        get(&end_tp);
        return (end_tp.tv_sec - start_tp.tv_sec) +
               (1E-9 * end_tp.tv_nsec - 1E-9 * start_tp.tv_nsec);
    }
  private:
    static void get(struct timespec* p_tp) {
        if (clock_gettime(CLOCK_REALTIME, p_tp) != 0)
        {
            std::cerr << "clock_gettime() error\n";
            exit(EXIT_FAILURE);
        }
    }
    struct timespec start_tp;
};

template <typename T>
unsigned f()
{
    int n = 0;
    Timer timer;
    T t;
    for (int i = 0; i < 10000000; ++i)
    {
        t.a(i & 0x01);
        t.b(i & 0x1F);
        t.c(i & 0x03);
        t.d(i & 0xFF);
        n += t.a() + t.b() + t.c() + t.d();
    }
    std::cout << timer.elapsed() << '\n';
    return n;
}

int main()
{
    std::cout << "bitfields: " << f<A>() << '\n';
    std::cout << "separate ints: " << f<B>() << '\n';
    std::cout << "explicit and/or/shift: " << f<C>() << '\n';
}

Output on my test machine (numbers vary by ~20% run to run):

bitfields: 0.140586
1449991808
separate ints: 0.039374
1449991808
explicit and/or/shift: 0.252723
1449991808

Suggests that with g++ -O3 on a pretty recent Athlon, bitfields are worse than a few times slower than separate ints, and this particular and/or/bitshift implementation's at least twice as bad again ("worse" as other operations like memory read/writes are emphasised by the volatility above, and there's loop overhead etc, so the differences are understated in the results).

If you're dealing in hundreds of megabytes of structs that can be mainly bitfields or mainly distinct ints, the caching issues may become dominant - so benchmark in your system.

>>> update from 2021 with an AMD Ryzen 9 3900X and -O2 -march=native:

bitfields: 0.0224893
1449991808
separate ints: 0.0288447
1449991808
explicit and/or/shift: 0.0190325
1449991808

Here we see everything has changed massively, the main implication being - benchmark with the systems you care about.


UPDATE: user2188211 attempted an edit which was rejected but usefully illustrated how bitfields become faster as the amount of data increases: "when iterating over a vector of a few million elements in [a modified version of] the above code, such that the variables do not reside in cache or registers, the bitfield code may be the fastest."

template <typename T>
unsigned f()
{
    int n = 0;
    Timer timer;
    std::vector<T> ts(1024 * 1024 * 16);
    for (size_t i = 0, idx = 0; i < 10000000; ++i)
    {
        T& t = ts[idx];
        t.a(i & 0x01);
        t.b(i & 0x1F);
        t.c(i & 0x03);
        t.d(i & 0xFF);
        n += t.a() + t.b() + t.c() + t.d();
        idx++;
        if (idx >= ts.size()) {
            idx = 0;
        }
    }
    std::cout << timer.elapsed() << '\n';
    return n;
}

Results on from an example run (g++ -03, Core2Duo):

 0.19016
 bitfields: 1449991808
 0.342756
 separate ints: 1449991808
 0.215243
 explicit and/or/shift: 1449991808


Of course, timing's all relative and which way you implement these fields may not matter at all in the context of your system.

Solution 4 - C++

I've seen/used bit fields in two situations: Computer Games and Hardware Interfaces. The hardware use is pretty straight forward: the hardware expects data in a certain bit format you can either define manually or through pre-defined library structures. It depends on the specific library whether they use bit fields or just bit manipulation.

In the "old days" computers games used bit fields frequently to make the most use of computer/disk memory as possible. For example, for a NPC definition in a RPG you might find (made up example):

struct charinfo_t
{
     unsigned int Strength : 7;  // 0-100
     unsigned int Agility : 7;  
     unsigned int Endurance: 7;  
     unsigned int Speed : 7;  
     unsigned int Charisma : 7;  
     unsigned int HitPoints : 10;    //0-1000
     unsigned int MaxHitPoints : 10;  
     //etc...
};

You don't see it so much in more modern games/software as the space savings has gotten proportionally worse as computers get more memory. Saving a 1MB of memory when your computer only has 16MB is a big deal but not so much when you have 4GB.

Solution 5 - C++

The primary purpose of bit-fields is to provide a way to save memory in massively instantiated aggregate data structures by achieving tighter packing of data.

The whole idea is to take advantage of situations where you have several fields in some struct type, which don't need the entire width (and range) of some standard data type. This provides you with the opportunity to pack several of such fields in one allocation unit, thus reducing the overall size of the struct type. And extreme example would be boolean fields, which can be represented by individual bits (with, say, 32 of them being packable into a single unsigned int allocation unit).

Obviously, this only makes sense in situation where the pros of the reduced memory consumption outweigh the cons of slower access to values stored in bit-fields. However, such situations arise quite often, which makes bit-fields an absolutely indispensable language feature. This should answer your question about the modern use of bit-fields: not only they are used, they are essentially mandatory in any practically meaningful code oriented on processing large amounts of homogeneous data (like large graphs, for one example), because their memory-saving benefits greatly outweigh any individual-access performance penalties.

In a way, bit-fields in their purpose are very similar to such things as "small" arithmetic types: signed/unsigned char, short, float. In the actual data-processing code one would not normally use any types smaller than int or double (with few exceptions). Arithmetic types like signed/unsigned char, short, float exist just to serve as "storage" types: as memory-saving compact members of struct types in situations where their range (or precision) is known to be sufficient. Bit-fields is just another step in the same direction, that trades a bit more performance for much greater memory-saving benefits.

So, that gives us a rather clear set of conditions under which it is worthwhile to employ bit-fields:

  1. Struct type contains multiple fields that can be packed into a smaller number of bits.
  2. The program instantiates a large number of objects of that struct type.

If the conditions are met, you declare all bit-packable fields contiguously (typically at the end of the struct type), assign them their appropriate bit-widths (and, usually, take some steps to ensure that the bit-widths are appropriate). In most cases it makes sense to play around with ordering of these fields to achieve the best packing and/or performance.


There's also a weird secondary use of bit-fields: using them for mapping bit groups in various externally-specified representations, like hardware registers, floating-point formats, file formats etc. This has never been intended as a proper use of bit-fields, even though for some unexplained reason this kind of bit-field abuse continues to pop-up in real-life code. Just don't do this.

Solution 6 - C++

One use for bit fields used to be to mirror hardware registers when writing embedded code. However, since the bit order is platform-dependent, they don't work if the hardware orders its bits different from the processor. That said, I can't think of a use for bit fields any more. You're better off implementing a bit manipulation library that can be ported across platforms.

Solution 7 - C++

Bit fields were used in the olden days to save program memory.

They degrade performance because registers can not work with them so they have to be converted to integers to do anything with them. They tend to lead to more complex code that is unportable and harder to understand (since you have to mask and unmask things all the time to actually use the values.)

Check out the source for http://www.nethack.org/ to see pre ansi c in all its bitfield glory!

Solution 8 - C++

In the 70s I used bit fields to control hardware on a trs80. The display/keyboard/cassette/disks were all memory mapped devices. Individual bits controlled various things.

  1. A bit controlled 32 column vs 64 column display.
  2. Bit 0 in that same memory cell was the cassette serial data in/out.

As I recall, the disk drive control had a number of them. There were 4 bytes in total. I think there was a 2 bit drive select. But it was a long time ago. It was kind of impressive back then in that there were at least two different c compilers for the platform.

The other observation is that bit fields really are platform specific. There is no expectation that a program with bit fields should port to another platform.

Solution 9 - C++

In modern code, there's really only one reason to use bitfields: to control the space requirements of a bool or an enum type, within a struct/class. For instance (C++):

enum token_code { TK_a, TK_b, TK_c, ... /* less than 255 codes */ };
struct token {
    token_code code      : 8;
    bool number_unsigned : 1;
    bool is_keyword      : 1;
    /* etc */
};

IMO there's basically no reason not to use :1 bitfields for bool, as modern compilers will generate very efficient code for it. In C, though, make sure your bool typedef is either the C99 _Bool or failing that an unsigned int, because a signed 1-bit field can hold only the values 0 and -1 (unless you somehow have a non-twos-complement machine).

With enumeration types, always use a size that corresponds to the size of one of the primitive integer types (8/16/32/64 bits, on normal CPUs) to avoid inefficient code generation (repeated read-modify-write cycles, usually).

Using bitfields to line up a structure with some externally-defined data format (packet headers, memory-mapped I/O registers) is commonly suggested, but I actually consider it a bad practice, because C doesn't give you enough control over endianness, padding, and (for I/O regs) exactly what assembly sequences get emitted. Have a look at Ada's representation clauses sometime if you want to see how much C is missing in this area.

Solution 10 - C++

Boost.Thread uses bitfields in its shared_mutex, on Windows at least:

    struct state_data
    {
        unsigned shared_count:11,
        shared_waiting:11,
        exclusive:1,
        upgrade:1,
        exclusive_waiting:7,
        exclusive_waiting_blocked:1;
    };

Solution 11 - C++

An alternative to consider is to specify bit field structures with a dummy structure (never instantiated) where each byte represents a bit:

struct Bf_format
{
  char field1[5];
  char field2[9];
  char field3[18];
};

With this approach sizeof gives the width of the bit field, and offsetof give the offset of the bit field. At least in the case of GNU gcc, compiler optimization of bit-wise operations (with constant shifts and masks) seems to have gotten to rough parity with (base language) bit fields.

I have written a C++ header file (using this approach) which allows structures of bit fields to be defined and used in a performant, much more portable, much more flexible way: https://github.com/wkaras/C-plus-plus-library-bit-fields . So, unless you are stuck using C, I think there would rarely be a good reason to use the base language facility for bit fields.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRusselView Question on Stackoverflow
Solution 1 - C++Oliver CharlesworthView Answer on Stackoverflow
Solution 2 - C++cafView Answer on Stackoverflow
Solution 3 - C++Tony DelroyView Answer on Stackoverflow
Solution 4 - C++uespView Answer on Stackoverflow
Solution 5 - C++AnTView Answer on Stackoverflow
Solution 6 - C++sizzzzlerzView Answer on Stackoverflow
Solution 7 - C++nate cView Answer on Stackoverflow
Solution 8 - C++EvilTeachView Answer on Stackoverflow
Solution 9 - C++zwolView Answer on Stackoverflow
Solution 10 - C++Steve TownsendView Answer on Stackoverflow
Solution 11 - C++WaltKView Answer on Stackoverflow