Bitfield manipulation in C

CBit Manipulation

C Problem Overview


The classic problem of testing and setting individual bits in an integer in C is perhaps one the most common intermediate-level programming skills. You set and test with simple bitmasks such as

unsigned int mask = 1<<11;

if (value & mask) {....} // Test for the bit
value |= mask;    // set the bit
value &= ~mask;   // clear the bit

An interesting blog post argues that this is error prone, difficult to maintain, and poor practice. The C language itself provides bit level access which is typesafe and portable:

typedef unsigned int boolean_t;
#define FALSE 0
#define TRUE !FALSE
typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        int raw;
} flags_t;

int
create_object(flags_t flags)
{
        boolean_t is_compat = flags.compat;

        if (is_compat)
                flags.force = FALSE;

        if (flags.force) {
                [...]
        }
        [...]
}

But this makes me cringe.

The interesting argument my coworker and I had about this is still unresolved. Both styles work, and I maintain the classic bitmask method is easy, safe, and clear. My coworker agrees it's common and easy, but the bitfield union method is worth the extra few lines to make it portable and safer.

Is there any more arguments for either side? In particular is there some possible failure, perhaps with endianness, that the bitmask method may miss but where the structure method is safe?

C Solutions


Solution 1 - C

Bitfields are not quite as portable as you think, as "C gives no guarantee of the ordering of fields within machine words" (The C book)

Ignoring that, used correctly, either method is safe. Both methods also allow symbolic access to integral variables. You can argue that the bitfield method is easier to write, but it also means more code to review.

Solution 2 - C

If the issue is that setting and clearing bits is error prone, then the right thing to do is to write functions or macros to make sure you do it right.

// off the top of my head
#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)
#define CLEAR_BIT(val, bitIndex) val &= ~(1 << bitIndex)
#define TOGGLE_BIT(val, bitIndex) val ^= (1 << bitIndex)
#define BIT_IS_SET(val, bitIndex) (val & (1 << bitIndex)) 

Which makes your code readable if you don't mind that val has to be an lvalue except for BIT_IS_SET. If that doesn't make you happy, then you take out assignment, parenthesize it and use it as val = SET_BIT(val, someIndex); which will be equivalent.

Really, the answer is to consider decoupling the what you want from how you want to do it.

Solution 3 - C

Bitfields are great and easy to read, but unfortunately the C language does not specify the layout of bitfields in memory, which means they are essentially useless for dealing with packed data in on-disk formats or binary wire protocols. If you ask me, this decision was a design error in C—Ritchie could have picked an order and stuck with it.

Solution 4 - C

You have to think about this from the perspective of a writer -- know your audience. So there are a couple of "audiences" to consider.

First there's the classic C programmer, who have bitmasked their whole lives and could do it in their sleep.

Second there's the newb, who has no idea what all this |, & stuff is. They were programming php at their last job and now they work for you. (I say this as a newb who does php)

If you write to satisfy the first audience (that is bitmask-all-day-long), you'll make them very happy, and they'll be able to maintain the code blindfolded. However, the newb will likely need to overcome a large learning curve before they are able to maintain your code. They will need to learn about binary operators, how you use these operations to set/clear bits, etc. You're almost certainly going to have bugs introduced by the newb as he/she all the tricks required to get this to work.

On the other hand, if you write to satisfy the second audience, the newbs will have an easier time maintaining the code. They'll have an easier time groking

 flags.force = 0;

than

 flags &= 0xFFFFFFFE;

and the first audience will just get grumpy, but its hard to imagine they wouldn't be able to grok and maintain the new syntax. It's just much harder to screw up. There won't be new bugs, because the newb will more easily maintain the code. You'll just get lectures about how "back in my day you needed a steady hand and a magnetized needle to set bits... we didn't even HAVE bitmasks!" (thanks XKCD).

So I would strongly recommend using the fields over the bitmasks to newb-safe your code.

Solution 5 - C

The union usage has undefined behavior according to the ANSI C standard, and thus, should not be used (or at least not be considered portable).

From the ISO/IEC 9899:1999 (C99) standard:

Annex J - Portability Issues:

> 1 The following are unspecified: > > — The value of padding bytes when storing values in structures or unions (6.2.6.1). > > — The value of a union member other than the last one stored into (6.2.6.1).

6.2.6.1 - Language Concepts - Representation of Types - General:

> 6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.[42]) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation. > > 7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

So, if you want to keep the bitfield ↔ integer correspondence, and to keep portability, I strongly suggest you to use the bitmasking method, that contrary to the linked blog post, it is not poor practice.

Solution 6 - C

What it is about the bitfield approach that makes you cringe?

Both techniques have their place, and the only decision I have is which one to use:

For simple "one-off" bit fiddling, I use the bitwise operators directly.

For anything more complex - eg hardware register maps, the bitfield approach wins hands down.

  • Bitfields are more succinct to use (at the expense of /slightly/ more verbosity to write.

  • Bitfields are more robust (what size is "int", anyway)

  • Bitfields are usually just as fast as bitwise operators.

  • Bitfields are very powerful when you have a mix of single and multiple bit fields, and extracting the multiple-bit field involves loads of manual shifts.

  • Bitfields are effectively self-documenting. By defining the structure and therefore naming the elements, I know what it's meant to do.

  • Bitfields also seamlessly handle structures bigger than a single int.

  • With bitwise operators, typical (bad) practice is a slew of #defines for the bit masks.

  • The only caveat with bitfields is to make sure the compiler has really packed the object into the size you wanted. I can't remember if this is define by the standard, so a assert(sizeof(myStruct) == N) is a useful check.

Solution 7 - C

The blog post you are referring to mentions raw union field as alternative access method for bitfields.

The purposes blog post author used raw for are ok, however if you plan to use it for anything else (e.g. serialisation of bit fields, setting/checking individual bits), disaster is just waiting for you around the corner. The ordering of bits in memory is architecture dependent and memory padding rules vary from compiler to compiler (see wikipedia), so exact position of each bitfield may differs, in other words you never can be sure which bit of raw each bitfield corresponds to.

However if you don't plan to mix it you better take raw out and you will be safe.

Solution 8 - C

Well you can't go wrong with structure mapping since both fields are accessable they can be used interchangably.

One benefit for bit fields is that you can easily aggregate options:

mask = USER|FORCE|ZERO|COMPAT;

vs

flags.user = true;
flags.force = true;
flags.zero = true;
flags.compat = true;

In some environments such as dealing with protocol options it can get quite old having to individually set options or use multiple parameters to ferry intermediate states to effect a final outcome.

But sometimes setting flag.blah and having the list popup in your IDE is great especially if your like me and can't remember the name of the flag you want to set without constantly referencing the list.

I personally will sometimes shy away from declaring boolean types because at some point I'll end up with the mistaken impression that the field I just toggled was not dependent (Think multi-thread concurrency) on the r/w status of other "seemingly" unrelated fields which happen to share the same 32-bit word.

My vote is that it depends on the context of the situation and in some cases both approaches may work out great.

Solution 9 - C

Either way, bitfields have been used in GNU software for decades and it hasn't done them any harm. I like them as parameters to functions.

I would argue that bitfields are conventional as opposed to structs. Everyone knows how to AND the values to set various options off and the compiler boils this down to very efficient bitwise operations on the CPU.

Providing you use the masks and tests in the correct way, the abstractions the compiler provide should make it robust, simple, readable and clean.

When I need a set of on/off switches, Im going to continue using them in C.

Solution 10 - C

In C++, just use std::bitset<N>.

Solution 11 - C

It is error-prone, yes. I've seen lots of errors in this kind of code, mainly because some people feel that they should mess with it and the business logic in a totally disorganized way, creating maintenance nightmares. They think "real" programmers can write value |= mask; , value &= ~mask; or even worse things at any place, and that's just ok. Even better if there's some increment operator around, a couple of memcpy's, pointer casts and whatever obscure and error-prone syntax happens to come to their mind at that time. Of course there's no need to be consistent and you can flip bits in two or three different ways, distributed randomly.

My advice would be:

  1. Encapsulate this ---- in a class, with methods such as SetBit(...) and ClearBit(...). (If you don't have classes in C, in a module.) While you're at it, you can document all their behaviour.
  2. Unit test that class or module.

Solution 12 - C

Your first method is preferable, IMHO. Why obfuscate the issue? Bit fiddling is a really basic thing. C did it right. Endianess doesn't matter. The only thing the union solution does is name things. 11 might be mysterious, but #defined to a meaningful name or enum'ed should suffice.

Programmers who can't handle fundamentals like "|&^~" are probably in the wrong line of work.

Solution 13 - C

When I google for "c operators"

The first three pages are:

http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40F_HTML/AQTLTBTE/DOCU_059.HTM http://www.cs.mun.ca/~michael/c/op.html

..so I think that argument about people new to the language is a little silly.

Solution 14 - C

I nearly always use the logical operations with a bit mask, either directly or as a macro. e.g.

 #define  ASSERT_GPS_RESET()                    { P1OUT &= ~GPS_RESET ; }

incidentally your union definition in the original question would not work on my processor/compiler combination. The int type is only 16 bits wide and the bitfield definitions are 32. To make it slightly more portable then you would have to define a new 32 bit type that you could then map to the required base type on each target architecture as part of the porting exercise. In my case

typedef   unsigned long int     uint32_t

and in the original example

typedef unsigned int uint32_t

typedef union {
        struct {
                boolean_t user:1;
                boolean_t zero:1;
                boolean_t force:1;
                int :28;                /* unused */
                boolean_t compat:1;     /* bit 31 */
        };
        uint32_t raw;
} flags_t;

The overlaid int should also be made unsigned.

Solution 15 - C

Well, I suppose that's one way of doing it, but I would always prefer to keep it simple.

Once you're used to it, using masks is straightforward, unambiguous and portable.

Bitfields are straightforward, but they are not portable without having to do additional work.

If you ever have to write MISRA-compliant code, the MISRA guidelines frown on bitfields, unions, and many, many other aspects of C, in order to avoid undefined or implementation-dependent behaviour.

Solution 16 - C

Generally, the one that is easier to read and understand is the one that is also easier to maintain. If you have co-workers that are new to C, the "safer" approach will probably be the easier one for them to understand.

Solution 17 - C

Bitfields are great, except that the bit manipulation operations are not atomic, and can thus lead to problems in multi-threaded application.

For example one could assume that a macro:

#define SET_BIT(val, bitIndex) val |= (1 << bitIndex)

Defines an atomic operation, since |= is one statement. But the ordinary code generated by a compiler will not try to make |= atomic.

So if multiple threads execute different set bit operations one of the set bit operation could be spurious. Since both threads will execute:

  thread 1             thread 2
  LOAD field           LOAD field
  OR mask1             OR mask2
  STORE field          STORE field

The result can be field' = field OR mask1 OR mask2 (intented), or the result can be field' = field OR mask1 (not intented) or the result can be field' = field OR mask2 (not intended).

Solution 18 - C

I'm not adding much to what's already been said, except to emphasize two points:

The compiler is free to arrange bits within a bitfield any way it wants. This mean if you're trying to manipulate bits in a microcontroller register, or if you want to send the bits to another processor (or even the same processor with a different compiler), you MUST use bitmasks.

On the other hand, if you're trying to create a compact representation of bits and small integers for use within a single processor, bitfields are easier to maintain and thus less error prone, and -- with most compilers -- are at least as efficient as manually masking and shifting.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSPWorleyView Question on Stackoverflow
Solution 1 - CMatthew FlaschenView Answer on Stackoverflow
Solution 2 - CplinthView Answer on Stackoverflow
Solution 3 - CNorman RamseyView Answer on Stackoverflow
Solution 4 - CDoug T.View Answer on Stackoverflow
Solution 5 - CJulianoView Answer on Stackoverflow
Solution 6 - CRoddyView Answer on Stackoverflow
Solution 7 - CqrdlView Answer on Stackoverflow
Solution 8 - CEinsteinView Answer on Stackoverflow
Solution 9 - CAiden BellView Answer on Stackoverflow
Solution 10 - CSteve JessopView Answer on Stackoverflow
Solution 11 - CDaniel DaranasView Answer on Stackoverflow
Solution 12 - CxcrampsView Answer on Stackoverflow
Solution 13 - CSan JacintoView Answer on Stackoverflow
Solution 14 - CuɐɪView Answer on Stackoverflow
Solution 15 - CSteve MelnikoffView Answer on Stackoverflow
Solution 16 - CMikeView Answer on Stackoverflow
Solution 17 - Cuser502187View Answer on Stackoverflow
Solution 18 - Cfearless_foolView Answer on Stackoverflow