Forking vs Threading

LinuxMultithreadingUnixProgramming LanguagesFork

Linux Problem Overview


I have used threading before in my applications and know its concepts well, but recently in my operating system lecture I came across fork(). Which is something similar to threading.

I google searched difference between them and I came to know that:

  1. Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory.
  2. Threads are light-weight process which have less overhead

But, there are still some questions in my mind.

  1. When should you prefer fork() over threading and vice-verse?
  2. If I want to call an external application as a child, then should I use fork() or threads to do it?
  3. While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
  4. Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

Linux Solutions


Solution 1 - Linux

The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.

Specifically to your questions:

> When should you prefer fork() over threading and vice-verse?

On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.

> If I want to call an external application as a child, then should I use fork() or threads to do it?

If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.

> While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?

Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.

> Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.

Solution 2 - Linux

A forked process is called a heavy-weight process, whereas a threaded process is called light-weight process.

The following are the difference between them:

  1. A forked process is considered a child process whereas a threaded process is called a sibling.
  2. Forked process shares no resource like code, data, stack etc with the parent process whereas a threaded process can share code but has its own stack.
  3. Process switching requires the help of OS but thread switching it is not required
  4. Creating multiple processes is a resource intensive task whereas creating multiple thread is less resource intensive task
  5. Each process can run independently whereas one thread can read/write another threads data. Thread and process lecture enter image description here

Solution 3 - Linux

fork() spawns a new copy of the process, as you've noted. What isn't mentioned above is the exec() call which often follows. This replaces the existing process with a new process (a new executable) and as such, fork()/exec() is the standard means of spawning a new process from an old one.

e.g. that's how your shell will invoke a process from the command line. You specify your process (ls, say) and the shell forks and then execs ls.

Note that this operates at a very different level from threading. Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.

Solution 4 - Linux

As @2431234123412341234123 said, on Linux thanks to COW, processes are not much heavier than threads and boils down to their usage. COW - copy on write means that a memory page of the forked process gets copied only when forked process makes changes to it, otherwise OS keeps redirecting it to pages of the parent process.

From a programming use case, let us say in the heap memory you have a big data structure a 2d array[2000000][100] (200 mb), and the page size of the kernel is around 4 mb. When the process is forked, no new memory for this array will be allocated. If one particular row (100 bytes) is changed (in either parent process or child), only the corresponding page (4 kb or 8kb if it is overlapping in two pages) will be copied and updated for the forked thread.

Other memory portions of memory work in forked processes same as threads (code is same, registers and call stack are separate).

On Windows as @Niels Keurentjes said, thrads might be better from a performance view, but on Linux it is more of use case.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRushabh RajeshKumar PadaliaView Question on Stackoverflow
Solution 1 - LinuxNiels KeurentjesView Answer on Stackoverflow
Solution 2 - LinuxSelvaperumalView Answer on Stackoverflow
Solution 3 - LinuxBrian AgnewView Answer on Stackoverflow
Solution 4 - LinuxVarun GargView Answer on Stackoverflow