How to perform atomic operations on Linux that work on x86, arm, GCC and icc?

C++CLinuxAtomic

C++ Problem Overview


Every Modern OS provides today some atomic operations:

  • Windows has Interlocked* API
  • FreeBSD has <machine/atomic.h>
  • Solaris has <atomic.h>
  • Mac OS X has <libkern/OSAtomic.h>

Anything like that for Linux?

  • I need it to work on most Linux supported platforms including: x86, x86_64 and arm.
  • I need it to work on at least GCC and Intel Compiler.
  • I need not to use 3rd par library like glib or qt.
  • I need it to work in C++ (C not required)

Issues:

  • GCC atomic builtins __sync_* are not supported on all platforms (ARM) and are not supported by the Intel compiler.
  • AFAIK <asm/atomic.h> should not be used in user space and I haven't successfully used it at all. Also, I'm not sure if it would work with Intel compiler.

Any suggestions?

I know that there are many related questions but some of them point to __sync* which is not feasible for me (ARM) and some point to asm/atomic.h.

Maybe there is an inline assembly library that does this for GCC (ICC supports gcc assembly)?

Edit:

There is a very partial solution for add operations only (allows implementing atomic counter but not lock free-structures that require CAS):

If you use libstc++ (Intel Compiler uses libstdc++) then you can use __gnu_cxx::__exchange_and_add that defined in <ext/atomicity.h> or <bits/atomicity.h>. Depends on compiler version.

However I'd still like to see something that supports CAS.

C++ Solutions


Solution 1 - C++

Projects are using this:

http://packages.debian.org/source/sid/libatomic-ops

If you want simple operations such as CAS, can't you just just use the arch-specific implementations out of the kernel, and do arch checks in user-space with autotools/cmake? As far as licensing goes, although the kernel is GPL, I think it's arguable that the inline assembly for these operations is provided by Intel/AMD, not that the kernel has a license on them. They just happen to be in an easily accessible form in the kernel source.

Solution 2 - C++

Recent standards (from 2011) of C & C++ now specify atomic operations:

Regardless, your platform or compiler may not support these newer headers & features.

Solution 3 - C++

Darn. I was going to suggest the GCC primitives, then you said they were off limits. :-)

In that case, I would do an #ifdef for each architecture/compiler combination you care about and code up the inline asm. And maybe check for __GNUC__ or some similar macro and use the GCC primitives if they are available, because it feels so much more right to use those. :-)

You are going to have a lot of duplication and it might be difficult to verify correctness, but this seems to be the way a lot of projects do this, and I've had good results with it.

Some gotchas that have bit me in the past: when using GCC, don't forget "asm volatile" and clobbers for "memory" and "cc", etc.

Solution 4 - C++

Boost, which has a non intrusive license, and other frameworks already offer portable atomic counters -- as long as they are supported on the target platform.

Third party libraries are good for us. And if for strange reasons your company forbid you from using them, you can still have a look at how they proceed (as long as the licence permit it for your use) to implement what your are looking for.

Solution 5 - C++

I recently did an implementation of such a thing and I was confronted to the same difficulties as you are. My solution was basically the following:

  • try to detect the gcc builtins with the feature macro
  • if not available just implement something like cmpxch with __asm__ for the other architectures (ARM is a bit more complicated than that). Just do that for one possible size, e.g sizeof(int).
  • implement all other functionality on top of that one or two primitives with inline functions

Solution 6 - C++

There is a patch for GCC here to support ARM atomic operations. WIll not help you on Intel, but you could examine the code - there is recent kernel support for older ARM architectures, and newer ones have the instructions built in, so you should be able to build something that works.

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

Solution 7 - C++

__sync* certainly is (and has been) supported by the Intel compiler, because GCC adopted these build-ins from there. Read the first paragraph on this page. Also see "IntelĀ® C++ Compiler for Linux* Intrinsics Reference", page 198. It's from 2006 and describes exactly those built-ins.

Regarding ARM support, for older ARM CPUs: it cannot be done entirely in userspace, but it can be done in kernelspace (by disabling interrupts during the operation), and I think I read somewhere that it is supported for quite a while now.

According to this PHP bug, dated 2011-10-08, __sync_* will only fail on

  • PA-RISC with anything other than Linux
  • SPARCv7 and lower
  • ARM with GCC < 4.3
  • ARMv5 and lower with anything other than Linux
  • MIPS1

So with GCC > 4.3 (and 4.7 is the current one), you shouldn't have a problem with ARMv6 and newer. You shouldn't have no problem with ARMv5 either as long as compiling for Linux.

Solution 8 - C++

On Debian/Ubuntu recommend...

sudo apt-get install libatomic-ops-dev

examples: http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

GCC & ICC compatible.

compared to Intel Thread Building Blocks (TBB), using atomic< T >, libatomic-ops-dev is over twice as fast! (Intel compiler)

Testing on Ubuntu i7 producer-consumer threads piping 10 million ints down a ring buffer connection in 0.5secs as opposed to 1.2secs for TBB

And easy to use e.g.

volatile AO_t head;

AO_fetch_and_add1(&head);

Solution 9 - C++

See: kernel_user_helpers.txt or entry-arm.c and look for __kuser_cmpxchg. As seen in comments of other ARM Linux versions,

kuser_cmpxchg

Location:       0xffff0fc0

Reference prototype:

int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);

Input:

r0 = oldval r1 = newval r2 = ptr lr = return address

Output:

r0 = success code (zero or non-zero) C flag = set if r0 == 0, clear if r0 != 0

Clobbered registers:

r3, ip, flags

Definition:

Atomically store newval in *ptr only if *ptr is equal to oldval. Return zero if *ptr was changed or non-zero if no exchange happened. The C flag is also set if *ptr was changed to allow for assembly optimization in the calling code.

Usage example:

 typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
 #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)

 int atomic_add(volatile int *ptr, int val)
 {
        int old, new;

        do {
                old = *ptr;
                new = old + val;
        } while(__kuser_cmpxchg(old, new, ptr));

        return new;
}

Notes:

  • This routine already includes memory barriers as needed.
  • Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).

This is for use with Linux with ARMv3 using the swp primitive. You must have a very ancient ARM not to support this. Only a data abort or interrupt can cause the spinning to fail, so the kernel monitors for this address ~0xffff0fc0 and performs a user space PC fix-up when either a data abort or an interrupt occurs. All user-space libraries that support ARMv5 and lower will use this facility.

For instance, QtConcurrent uses this.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionArtyomView Question on Stackoverflow
Solution 1 - C++Noah WatkinsView Answer on Stackoverflow
Solution 2 - C++kevinarpeView Answer on Stackoverflow
Solution 3 - C++asveikauView Answer on Stackoverflow
Solution 4 - C++Luc HermitteView Answer on Stackoverflow
Solution 5 - C++Jens GustedtView Answer on Stackoverflow
Solution 6 - C++Justin CormackView Answer on Stackoverflow
Solution 7 - C++MeckiView Answer on Stackoverflow
Solution 8 - C++user1408985View Answer on Stackoverflow
Solution 9 - C++artless noiseView Answer on Stackoverflow