How can I distinguish between high- and low-performance cores/threads in C++?

C++MultithreadingPerformanceIntelApple M1

C++ Problem Overview


When talking about multi-threading, it often seems like threads are treated as equal - just the same as the main thread, but running next to it.

On some new processors, however, such as the Apple "M" series and the upcoming Intel Alder Lake series not all threads are equally as performant as these chips feature separate high-performance cores and high-efficiency, slower cores.

It’s not to say that there weren’t already things such as hyper-threading, but this seems to have a much larger performance implication.

Is there a way to query std::thread‘s properties and enforce on which cores they’ll run in C++?

C++ Solutions


Solution 1 - C++

> How to distinguish between high- and low-performance cores/threads in C++?

Please understand that "thread" is an abstraction of the hardware's capabilities and that something beyond your control (the OS, the kernel's scheduler) is responsible for creating and managing this abstraction. "Importance" and performance hints are part of that abstraction (typically presented in the form of a thread priority).

Any attempt to break the "thread" abstraction (e.g. determine if the core is a low-performance or high-performance core) is misguided. E.g. OS could change your thread to a low performance core immediately after you find out that you were running on a high performance core, leading you to assume that you're on a high performance core when you are not.

Even pinning your thread to a specific core (in the hope that it'll always be using a high-performance core) can/will backfire (cause you to get less work done because you've prevented yourself from using a "faster than nothing" low-performance core when high-performance core/s are busy doing other work).

The biggest problem is that C++ creates a worse abstraction (std::thread) on top of the "likely better" abstraction provided by the OS. Specifically, there's no way to set, modify or obtain the thread priority using std::thread; so you're left without any control over the "performance hints" that are necessary (for the OS, scheduler) to make good "load vs. performance vs. power management" decisions.

> When talking about multi-threading, it often seems like threads are treated as equal

Often people think we're still using time-sharing systems from the 1960s. Stop listening to these fools. Modern systems do not allow CPU time to be wasted on unimportant work while more important work waits. Effective use of thread priorities is a fundamental performance requirement. Everything else ("load vs. performance vs. power management" decisions) is, by necessity, beyond your control (on the other side of the "thread" abstraction you're using).

Solution 2 - C++

> Is there any way to query std::thread‘s properties and enforce on which cores they’ll run in C++?

No. There is no standard API for this in C++.

Platform-specific APIs do have the ability to specify a specific logical core (or a set of such cores) for a software thread. For example, GNU has pthread_setaffinity_np.

Note that this allows you to specify "core 1" for your thread, but that doesn't necessarily help with getting the "performance" core unless you know which core that is. To figure that out, you may need to go below OS level and into CPU-specific assembly programming. In the case of Intel to my understanding, you would use the Enhanced Hardware Feedback Interface.

Solution 3 - C++

No, the C++ standard library has no direct way to query the sub-type of CPU, or state you want a thread to run on a specific CPU.

But std::thread (and jthread) does have .native_handle(), which on most platforms will let you do this.

If you know the threading library implementation of your std::thread, you can use native_handle() to get at the underlying primitives, then use the underlying threading library to do this kind of low-level work.

This will be completely non-portable, of course.

Solution 4 - C++

iPhones, iPads, and newer Macs have high- and low-performance cores for a reason. The low-performance cores allow some reasonable amount of work to be done while using the smallest possible amount of energy, making the battery of the device last longer. These additional cores are not there just for fun; if you try to get around them, you can end up with a much worse experience for the user.

If you use the C++ standard library for running multiple threads, the operating system will detect what you are doing, and act accordingly. If your task only takes 10ms on a high-performance core, it will be moved to a low-performance core; it's fast enough and saves battery life. If you have multiple threads using 100% of the CPU time, the high-performance cores will be used automatically (plus the low-performance cores as well). If your battery runs low, the device can switch to all low-performance cores which will get more work done with the battery charge you have.

You should really think about what you want to do. You should put the needs of the user ahead of your perceived needs. Apart from that, Apple recommends assigning OS-specific priorities to your threads, which improves behaviour if you do it right. Giving a thread the highest priority so you can get better benchmark results is usually not "doing it right".

Solution 5 - C++

You can't select the core that a thread will be physically scheduled to run on using std::thread. See here for more. I'd suggest using a framework like OpenMP, MPI, or you will have dig into the native Mac OS APIs to select the core for your thread to execute on.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionjanekb04View Question on Stackoverflow
Solution 1 - C++BrendanView Answer on Stackoverflow
Solution 2 - C++eerorikaView Answer on Stackoverflow
Solution 3 - C++Yakk - Adam NevraumontView Answer on Stackoverflow
Solution 4 - C++gnasher729View Answer on Stackoverflow
Solution 5 - C++lmeninatoView Answer on Stackoverflow