How does Linux determine the next PID?

LinuxApacheLinux KernelKernel

Linux Problem Overview


How does Linux determine the next PID it will use for a process? The purpose of this question is to better understand the Linux kernel. Don't be afraid to post kernel source code. If PIDs are allocated sequentially how does Linux fill in the gaps? What happens when it hits the end?

For example if I run a PHP script from Apache that does a <?php print(getmypid());?> the same PID will be printed out for a few minutes while hit refresh. This period of time is a function of how many requests apache is receiving. Even if there is only one client the PID will eventually change.

When the PID changes, it will be a close number, but how close? The number does not appear to be entirely sequential. If I do a ps aux | grep apache I get a fair number of processes:

enter image description here

How does Linux choose this next number? The previous few PID's are still running, as well as the most recent PID that was printed. How does apache choose to reuse these PIDs?

Linux Solutions


Solution 1 - Linux

The kernel allocates PIDs in the range of (RESERVED_PIDS, PID_MAX_DEFAULT). It does so sequentially in each namespace (tasks in different namespaces can have the same IDs). In case the range is exhausted, pid assignment wraps around.

Some relevant code:

Inside alloc_pid(...)

for (i = ns->level; i >= 0; i--) {
    nr = alloc_pidmap(tmp);
    if (nr < 0)
        goto out_free;
    pid->numbers[i].nr = nr;
    pid->numbers[i].ns = tmp;
    tmp = tmp->parent;
}

alloc_pidmap()

static int alloc_pidmap(struct pid_namespace *pid_ns)
{
        int i, offset, max_scan, pid, last = pid_ns->last_pid;
        struct pidmap *map;

        pid = last + 1;
        if (pid >= pid_max)
                pid = RESERVED_PIDS;
        /* and later on... */
        pid_ns->last_pid = pid;
        return pid;
}

Do note that PIDs in the context of the kernel are more than just int identifiers; the relevant structure can be found in /include/linux/pid.h. Besides the id, it contains a list of tasks with that id, a reference counter and a hashed list node for fast access.

The reason for PIDs not appearing sequential in user space is because kernel scheduling might fork a process in between your process' fork() calls. It's very common, in fact.

Solution 2 - Linux

I would rather assume the behavior you watch stems from another source:

Good web servers usually have several process instances to balance the load of the requests. These processes are managed in a pool and assigned to a certain request each time a request comes in. To optimize performance Apache probably assigns the same process to a bunch of sequential requests from the same client. After a certain amount of requests that process is terminated and a new one is created.

I don't believe that more than one processes in sequence are assigned the same PID by linux.

As you say that the new PID is gonna be close to the last one, I guess Linux simply assigns each process the last PID + 1. But there are processes popping up and being terminated all the time in background by applications and system programs, thus you cannot predict the exact number of the apache process being started next.

Apart from this, you should not use any assumption about PID assignment as a base for something you implement. (See also sanmai's comment.)

Solution 3 - Linux

PIDs are sequential on most systems. You can see that by starting several processes by yourself on idle machine.

e.g. use up-arrow history recall to repeatedly run a command that prints its own PID:

$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 Mar 15 19:32 /proc/self -> 21491
$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 Mar 15 19:32 /proc/self -> 21492
$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 Mar 15 19:32 /proc/self -> 21493
$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 Mar 15 19:32 /proc/self -> 21494

Do not depend on this: for security reasons, some people run kernels that spend extra CPU time to randomly choose new PIDs.

Solution 4 - Linux

PIDs can be allocated randomly. There's a number of ways to accomplish that.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionrookView Question on Stackoverflow
Solution 1 - LinuxMichael FoukarakisView Answer on Stackoverflow
Solution 2 - LinuxchiccodoroView Answer on Stackoverflow
Solution 3 - LinuxVladislav RastrusnyView Answer on Stackoverflow
Solution 4 - LinuxsanmaiView Answer on Stackoverflow