When should one use a spinlock instead of mutex?

SynchronizationMutexSpinlock

Synchronization Problem Overview


I think both are doing the same job,how do you decide which one to use for synchronization?

Synchronization Solutions


Solution 1 - Synchronization

The Theory

In theory, when a thread tries to lock a mutex and it does not succeed, because the mutex is already locked, it will go to sleep, immediately allowing another thread to run. It will continue to sleep until being woken up, which will be the case once the mutex is being unlocked by whatever thread was holding the lock before. When a thread tries to lock a spinlock and it does not succeed, it will continuously re-try locking it, until it finally succeeds; thus it will not allow another thread to take its place (however, the operating system will forcefully switch to another thread, once the CPU runtime quantum of the current thread has been exceeded, of course).

The Problem

The problem with mutexes is that putting threads to sleep and waking them up again are both rather expensive operations, they'll need quite a lot of CPU instructions and thus also take some time. If now the mutex was only locked for a very short amount of time, the time spent in putting a thread to sleep and waking it up again might exceed the time the thread has actually slept by far and it might even exceed the time the thread would have wasted by constantly polling on a spinlock. On the other hand, polling on a spinlock will constantly waste CPU time and if the lock is held for a longer amount of time, this will waste a lot more CPU time and it would have been much better if the thread was sleeping instead.

The Solution

Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either. IOW, a spinlock wastes only CPU time on those systems for no real benefit. If the thread was put to sleep instead, another thread could have ran at once, possibly unlocking the lock and then allowing the first thread to continue processing, once it woke up again.

On a multi-core/multi-CPU systems, with plenty of locks that are held for a very short amount of time only, the time wasted for constantly putting threads to sleep and waking them up again might decrease runtime performance noticeably. When using spinlocks instead, threads get the chance to take advantage of their full runtime quantum (always only blocking for a very short time period, but then immediately continue their work), leading to much higher processing throughput.

The Practice

Since very often programmers cannot know in advance if mutexes or spinlocks will be better (e.g. because the number of CPU cores of the target architecture is unknown), nor can operating systems know if a certain piece of code has been optimized for single-core or multi-core environments, most systems don't strictly distinguish between mutexes and spinlocks. In fact, most modern operating systems have hybrid mutexes and hybrid spinlocks. What does that actually mean?

A hybrid mutex behaves like a spinlock at first on a multi-core system. If a thread cannot lock the mutex, it won't be put to sleep immediately, since the mutex might get unlocked pretty soon, so instead the mutex will first behave exactly like a spinlock. Only if the lock has still not been obtained after a certain amount of time (or retries or any other measuring factor), the thread is really put to sleep. If the same code runs on a system with only a single core, the mutex will not spinlock, though, as, see above, that would not be beneficial.

A hybrid spinlock behaves like a normal spinlock at first, but to avoid wasting too much CPU time, it may have a back-off strategy. It will usually not put the thread to sleep (since you don't want that to happen when using a spinlock), but it may decide to stop the thread (either immediately or after a certain amount of time; this is called "yielding") and allow another thread to run, thus increasing chances that the spinlock is unlocked (you still have the costs of a thread switch but not the costs of putting a thread to sleep and waking it up again).

Summary

If in doubt, use mutexes, they are usually the better choice and most modern systems will allow them to spinlock for a very short amount of time, if this seems beneficial. Using spinlocks can sometimes improve performance, but only under certain conditions and the fact that you are in doubt rather tells me, that you are not working on any project currently where a spinlock might be beneficial. You might consider using your own "lock object", that can either use a spinlock or a mutex internally (e.g. this behavior could be configurable when creating such an object), initially use mutexes everywhere and if you think that using a spinlock somewhere might really help, give it a try and compare the results (e.g. using a profiler), but be sure to test both cases, a single-core and a multi-core system before you jump to conclusions (and possibly different operating systems, if your code will be cross-platform).

Update: A Warning for iOS

Actually not iOS specific but iOS is the platform where most developers may face that problem: If your system has a thread scheduler, that does not guarantee that any thread, no matter how low its priority may be, will eventually get a chance to run, then spinlocks can lead to permanent deadlocks. The iOS scheduler distinguishes different classes of threads and threads on a lower class will only run if no thread in a higher class wants to run as well. There is no back-off strategy for this, so if you permanently have high class threads available, low class threads will never get any CPU time and thus never any chance to perform any work.

The problem appears as follow: Your code obtains a spinlock in a low prio class thread and while it is in the middle of that lock, the time quantum has exceeded and the thread stops running. The only way how this spinlock can be released again is if that low prio class thread gets CPU time again but this is not guaranteed to happen. You may have a couple of high prio class threads that constantly want to run and the task scheduler will always prioritize those. One of them may run across the spinlock and try to obtain it, which isn't possible of course, and the system will make it yield. The problem is: A thread that yielded is immediately available for running again! Having a higher prio than the thread holding the lock, the thread holding the lock has no chance to get CPU runtime. Either some other thread will get runtime or the thread that just yielded.

Why does this problem not occur with mutexes? When the high prio thread cannot obtain the mutex, it won't yield, it may spin a bit but will eventually be sent to sleep. A sleeping thread is not available for running until it is woken up by an event, e.g. an event like the mutex being unlocked it has been waiting for. Apple is aware of that problem and has deprecated OSSpinLock as a result. The new lock is called os_unfair_lock. This lock avoids the situation mentioned above as it is aware of the different thread priority classes. If you are sure that using spinlocks is a good idea in your iOS project, use that one. Stay away from OSSpinLock! And under no circumstances implement your own spinlocks in iOS! If in doubt, use a mutex. macOS is not affected by this issue as it has a different thread scheduler that won't allow any thread (even low prio threads) to "run dry" on CPU time, still the same situation can arise there and will then lead to very poor performance, thus OSSpinLock is deprecated on macOS as well.

Solution 2 - Synchronization

Continuing with Mecki's suggestion, this article pthread mutex vs pthread spinlock on Alexander Sandler's blog, Alex on Linux shows how the spinlock & mutexes can be implemented to test the behavior using #ifdef.

However, be sure to take the final call based on your observation, understanding as the example given is an isolated case, your project requirement, environment may be entirely different.

Solution 3 - Synchronization

Mecki's answer pretty well nails it. However, on a single processor, using a spinlock might make sense when the task is waiting on the lock to be given by an Interrupt Service Routine. The interrupt would transfer control to the ISR, which would ready the resource for use by the waiting task. It would end by releasing the lock before giving control back to the interrupted task. The spinning task would find the spinlock available and proceed.

Solution 4 - Synchronization

Please also note that on certain environments and conditions (such as running on windows on dispatch level >= DISPATCH LEVEL), you cannot use mutex but rather spinlock. On unix - same thing.

Here is equivalent question on competitor stackexchange unix site: https://unix.stackexchange.com/questions/5107/why-are-spin-locks-good-choices-in-linux-kernel-design-instead-of-something-more

Info on dispatching on windows systems: http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/IRQL_thread.doc

Solution 5 - Synchronization

Spinlock and Mutex synchronization mechanisms are very common today to be seen.

Let's think about Spinlock first.

Basically it is a busy waiting action, which means that we have to wait for a specified lock is released before we can proceed with the next action. Conceptually very simple, while implementing it is not on the case. For example: If the lock has not been released then the thread was swap-out and get into the sleep state, should do we deal with it? How to deal with synchronization locks when two threads simultaneously request access ?

Generally, the most intuitive idea is dealing with synchronization via a variable to protect the critical section. The concept of Mutex is similar, but they are still different. Focus on: CPU utilization. Spinlock consumes CPU time to wait for do the action, and therefore, we can sum up the difference between the two:

In homogeneous multi-core environments, if the time spend on critical section is small than use Spinlock, because we can reduce the context switch time. (Single-core comparison is not important, because some systems implementation Spinlock in the middle of the switch)

In Windows, using Spinlock will upgrade the thread to DISPATCH_LEVEL, which in some cases may be not allowed, so this time we had to use a Mutex (APC_LEVEL).

Solution 6 - Synchronization

The rule for using spinlocks is simple: use a spinlock if and only if the real time the lock is held is bounded and sufficiently small.

Note that usually user implemented spinlocks DO NOT satisfy this requirement because they do not disable interrupts. Unless pre-emptions are disabled, a pre-emption whilst a spinlock is held violates the bounded time requirement.

Sufficiently small is a judgement call and depends on the context.

Exception: some kernel programming must use a spinlock even when the time is not bounded. In particular if a CPU has no work to do, it has no choice but to spin until some more work turns up.

Special danger: in low level programming take great care when multiple interrupt priorities exist (usually there is at least one non-maskable interrupt). In this higher priority pre-emptions can run even if interrupts at the thread priority are disabled (such as priority hardware services, often related to the virtual memory management). Provided a strict priority separation is maintained, the condition for bounded real time must be relaxed and replaced with bounded system time at that priority level. Note in this case not only can the lock holder be pre-empted but the spinner can also be interrupted; this is generally not a problem because there's nothing you can do about it.

Solution 7 - Synchronization

Spinlocks can actually perform very poorly on NUMA machines. The issue is easy to understand and very hard to fix (without switching to a mutex, that is). Consider a spinlock that lives in DRAM "near" core A, and threads on A and B contending for that lock. Assume that B is remote from this DRAM. As we all know, this means that memory accesses by A will be 5x or so faster than memory accesses by B, because B's accesses will need to traverse the bus of the NUMA chip, while A's accesses are local and hence avoid that bus traversal.

In effect, A's spin logic will run 5x or more faster than B's. Yes, they contend, and B disrupts A, but the impact is asymmetric: when A wins the race to access the lock next, it will be getting local loads and stores, and hence will be spinning at a much higher instruction rate. When B is spinning, those remote loads and stores will be slow, so B spins in slow motion.

The upshot, and we have observed this in our work on Derecho, is that we obtain a very unfair spinlock. A is strongly favored over B, and locking by B will take a very long time.

How would you observe this? In our case, we use LibFabrics, and that library has a few threads that get scattered over multiple cores. Within the LibFabric logic, A and B are spinning to lock and then check a completion queue associated with RDMA hardware. So the effect is that A gets to check this queue 5x more often than B. In cases where an action by B is needed (the completed operation at the head of that queue is owned by B), A effectively starves B for access -- slowing down LibFabrics in an extreme way, that snowballed to greatly impact our Derecho code. We've seen cases where A's access is so strongly favored that B might wait as long as 10ms for the lock -- even though under uncontended situations, B would grab this lock in 0.2us. So, the effect can be quite extreme.

Conclusion? Don't even consider using a spinlock on a NUMA system where your threads might be (1) on different NUMA cores, (2) with different locality to the DRAM where the spinlock was allocated. You will see massive performance issues! (3) When using a third-party library that has multiple threads, keep in mind that they may not have read this discussion and might have it wrong!

Solution 8 - Synchronization

>Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either. IOW, a spinlock wastes only CPU time on those systems for no real benefit

This is wrong. There is no wastage of cpu cycles in using spinlocks on uni processor systems, because once a process takes a spin lock , preemption is disabled , so as such, there could be no one else spinning! It's just that using it doesn't make any sense! Hence, spinlocks on Uni systems are replaced by preempt_disable at compile time by the kernel!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questioncompile-fanView Question on Stackoverflow
Solution 1 - SynchronizationMeckiView Answer on Stackoverflow
Solution 2 - SynchronizationTheCottonSilkView Answer on Stackoverflow
Solution 3 - SynchronizationAlanCView Answer on Stackoverflow
Solution 4 - SynchronizationDan JobsView Answer on Stackoverflow
Solution 5 - SynchronizationMarcus ThorntonView Answer on Stackoverflow
Solution 6 - SynchronizationYttrillView Answer on Stackoverflow
Solution 7 - SynchronizationKen BirmanView Answer on Stackoverflow
Solution 8 - SynchronizationNeelansh MittalView Answer on Stackoverflow