How to control which core a process runs on?

MulticoreIntelSystem CallsInstruction Set

Multicore Problem Overview


I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms.

I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously.

And now that we have multi-core processors (or multi-processor computers), we could have two processes running simultaneously on two separate cores.

My question is about the last scenario: how does the kernel control which core a process runs on? Which system calls (in Linux, or even Windows) schedule a process on a specific core?

The reason I'm asking: I'm working on a project for school where we are to explore a recent topic in computing - and I chose multi-core architectures. There seems to be a lot of material on how to program in that kind of environment (how to watch for deadlock or race conditions) but not much on controlling the individual cores themselves. I would love to be able to write a few demonstration programs and present some assembly instructions or C code to the effect of "See, I am running an infinite loop on the 2nd core, look at the spike in CPU utilization for that specific core".

Any code examples? Or tutorials?

edit: For clarification - many people have said that this is the purpose of the OS, and that one should let the OS take care of this. I completely agree! But then what I'm asking (or trying to get a feel for) is what the operating system actually does to do this. Not the scheduling algorithm, but more "once a core is chosen, what instructions must be executed to have that core start fetching instructions?"

Multicore Solutions


Solution 1 - Multicore

As others have mentioned, processor affinity is Operating System specific. If you want to do this outside the confines of the operating system, you're in for a lot of fun, and by that I mean pain.

That said, others have mentioned SetProcessAffinityMask for Win32. Nobody has mentioned the Linux kernel way to set processor affinity, and so I shall. You need to use the sched_setaffinity(2) system call. Here's a nice tutorial on how.

The command-line wrapper for this system call is taskset(1). e.g.
taskset -c 2,3 perf stat awk 'BEGIN{for(i=0;i<100000000;i++){}}' restricts that perf-stat of a busy-loop to running on either of core 2 or 3 (still allowing it to migrate between cores, but only between those two).

Solution 2 - Multicore

Normally the decision about which core an app will run on is made by the system. However, you can set the "affinity" for an application to a specific core to tell the OS to only run the app on that core. Normally this isn't a good idea, but there are some rare cases where it might make sense.

To do this in windows, use task manager, right click on the process, and choose "Set Affinity". You can do it programmatically in Windows using functions like SetThreadAffinityMask, SetProcessAffinityMask or SetThreadIdealProcessor.

ETA:

If you are interested in how the OS actually does the scheduling, you might want to check out these links:

Wikipedia article on context switching

Wikipedia article on scheduling

Scheduling in the linux kernel

With most modern OS's, the OS schedules a thread to execute on a core for a short slice of time. When the time slice expires, or the thread does an IO operation that causes it to voluntarily yield the core, the OS will schedule another thread to run on the core (if there are any threads ready to run). Exactly which thread is scheduled depends on the OS's scheduling algorithm.

The implementation details of exactly how the context switch occurs are CPU & OS dependent. It generally will involve a switch to kernel mode, the OS saving the state of the previous thread, loading the state of the new thread, then switching back to user mode and resuming the newly loaded thread. The context switching article I linked to above has a bit more detail about this.

Solution 3 - Multicore

Nothing tells core "now start running this process".

The core does not see process, it only knows about executable code and various running levels and associated limitations to instructions that can be executed.

When computer boots, for sake of simplicity only one core/processor is active and actually runs any code. Then if OS is MultiProcessor capable, it activates other cores with some system specific instruction, other cores most likely pick up from exactly same spot as other core and run from there.

So what scheduler does is it looks through OS internal structures (task/process/thread queue) and picks one and marks it as running at its core. Then other scheduler instances running on other cores won't touch it until the task is in waiting state again (and not marked as pinned to specific core). After task is marked as running, scheduler executes switch to userland with task resuming at the point it was previously suspended.

Technically there is nothing whatsoever stopping cores from running exact same code at exact same time (and many unlocked functions do), but unless code is written to expect that, it will probably piss all over itself.

Scenario goes weirder with more exotic memory models (above assumes "usual" linear single working memory space) where cores don't necessarily all see same memory and there may be requirements on fetching code from other core's clutches, but it's much easier handled by simply keeping task pinned to core (AFAIK Sony PS3 architecture with SPU's is like that).

Solution 4 - Multicore

To find out the number of processors instead of using /proc/cpuinfo just run:

nproc

To run a process on a group of specific processors:

taskset --cpu-list 1,2 my_command 

will say that my command can only run on cpu 1 or 2.

To run a program on 4 processors doing 4 different things use parameterization. The argument to the program tells it to do something different:

for i in `seq 0 1 3`;
do 
  taskset --cpu-list $i my_command $i;
done

A good example of this is dealing with 8 million operation in an array so that 0 to (2mil-1) goes to processor 1, 2mil to (4mil-1) to processor 2 and so on.

You can look at the load on each process by installing htop using apt-get/yum and running at the command line:

 htop

Solution 5 - Multicore

The OpenMPI project has a library to set the processor affinity on Linux in a portable way.

Some while back, I have used this in a project and it worked fine.

Caveat: I dimly remember that there were some issues in finding out how the operating system numbers the cores. I used this in a 2 Xeon CPU system with 4 cores each.

A look at cat /proc/cpuinfo might help. On the box I used, it is pretty weird. Boiled down output is at the end.

Evidently, the evenly numbered cores are on the first cpu and the oddly numbered cores are on the second cpu. However, if I remember correctly, there was an issue with the caches. On these Intel Xeon processors, two cores on each CPU share their L2 caches (I do not remember whether the processor has an L3 cache). I think that the virtual processors 0 and 2 shared one L2 cache, 1 and 3 shared one, 4 and 6 shared one and 5 and 7 shared one.

Because of this weirdness (1.5 years back I could not find any documentation on the process numbering in Linux), I would be careful do do this kind of low level tuning. However, there clearly are some uses. If your code runs on few kinds of machines then it might be worth to do this kind of tuning. Another application would be in some domain specific language like StreamIt where the compiler could do this dirty work and compute a smart schedule.

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4

Solution 6 - Multicore

Linux sched_setaffinity C minimal runnable example

In this example, we get the affinity, modify it, and check if it has taken effect with sched_getcpu().

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    printf("sched_getcpu = %d\n", sched_getcpu());
    CPU_ZERO(&mask);
    CPU_SET(0, &mask);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_setaffinity");
        assert(false);
    }
    print_affinity();
    /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
    printf("sched_getcpu = %d\n", sched_getcpu());
    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Sample output:

sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

Which means that:

  • initially, all of my 16 cores were enabled, and the process was randomly running on core 9 (the 10th one)
  • after we set the affinity to only the first core, the process was moved necessarily to core 0 (the first one)

It is also fun to run this program through taskset:

taskset -c 1,3 ./a.out

Which gives output of form:

sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

and so we see that it limited the affinity from the start.

This works because the affinity is inherited by child processes, which taskset is forking: https://stackoverflow.com/questions/8336191/how-to-prevent-inheriting-cpu-affinity-by-child-forked-process

Tested in Ubuntu 16.04.

x86 bare metal

If you are that hardcore: https://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like/33651438#33651438

How Linux implements it

https://stackoverflow.com/questions/766395/how-does-sched-setaffinity-work

Python: os.sched_getaffinity and os.sched_setaffinity

See: https://stackoverflow.com/questions/1006289/how-to-find-out-the-number-of-cpus-using-python/55423170#55423170

Solution 7 - Multicore

As others have mentioned, it's controlled by the operating system. Depending on the OS, it may or may not provide you with system calls that allow you to affect what core a given process executes on. However, you should usually just let the OS do the default behavior. If you have a 4-core system with 37 processes running, and 34 of those processes are sleeping, it's going to schedule the remaining 3 active processes onto separate cores.

You'll likely only see a speed boost on playing with core affinities in very specialized multithreaded applications. For example, suppose you have a system with 2 dual-core processors. Suppose you have an application with 3 threads, and two of threads operate heavily on the same set of data, whereas the third thread uses a different set of data. In this case, you would benefit the most by having the two threads which interact on the same processor and the third thread on the other processor, since then they can share a cache. The OS has no idea what memory each thread needs to access, so it may not allocate threads to cores appropriately.

If you're interested in how the operating system, read up on scheduling. The nitty gritty details of multiprocessing on x86 can be found in the Intel 64 and IA-32 Architectures Software Developer's Manuals. Volume 3A, Chapters 7 and 8 contain relevant information, but bear in mind these manuals are extremely technical.

Solution 8 - Multicore

The OS knows how to do this, you do not have to. You could run into all sorts of issues if you specified which core to run on, some of which could actually slow the process down. Let the OS figure it out, you just need to start the new thread.

For example, if you told a process to start on core x, but core x was already under a heavy load, you would be worse off than if you had just let the OS handle it.

Solution 9 - Multicore

I don't know the assembly instructions. But the windows API function is SetProcessAffinityMask. You can see an example of something I cobbled together a while ago to run Picasa on only one core

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpoundifdefView Question on Stackoverflow
Solution 1 - MulticoreRandolphoView Answer on Stackoverflow
Solution 2 - MulticoreEric PetroeljeView Answer on Stackoverflow
Solution 3 - MulticorePasi SavolainenView Answer on Stackoverflow
Solution 4 - MulticoreEamonn KennyView Answer on Stackoverflow
Solution 5 - MulticoreManuelView Answer on Stackoverflow
Solution 6 - MulticoreCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 7 - MulticoreAdam RosenfieldView Answer on Stackoverflow
Solution 8 - MulticoreEd S.View Answer on Stackoverflow
Solution 9 - MulticoreWill RickardsView Answer on Stackoverflow