Does multithreading make sense for IO-bound operations?

MultithreadingOptimization

Multithreading Problem Overview


When performing many disk operations, does multithreading help, hinder, or make no difference?

For example, when copying many files from one folder to another.

Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.

Multithreading Solutions


Solution 1 - Multithreading

Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?

Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.

However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.

I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.

Solution 2 - Multithreading

That depends on your definition of "I/O bound" but generally multithreading has two effects:

  • Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])

  • Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)

I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).

Solution 3 - Multithreading

I would think it depends on a number of factors, like the kind of application you are running, the number of concurrent users, etc.

I am currently working on a project that has a high degree of linear (reading files from start to finish) operations. We use a NAS for storage, and were concerned about what happens if we run multiple threads. Our initial thought was that it would slow us down because it would increase head seeks. So we ran some tests and found out that the ideal number of threads is the same as the number of cores in the computer.

But your mileage may vary.

Solution 4 - Multithreading

It can do, simply because whenever there is more work for a thread to do (identifying the next file to copy) the OS wakes it up, so threads are a simple way to hook into the OS scheduler and yet still write code in a traditional sequential way, instead of having to break it up into a state machine with callbacks.

This is mainly an assistance with clear programming rather than performance.

Solution 5 - Multithreading

In most cases, using multi-thread for disk IO will not benefit efficiency. Let's imagine 2 circumstances:

  1. Lock-Free File: We can split the file for each thread by giving them different IO offset. For instance, a 1024B bytes file is split into n pieces and each thread writes the 1024/n respectively. This will cause a lot of verbose disk head movement because of the different offset.
  2. Lock File: Actually lock the IO operation for each critical section. This will cause a lot of verbose thread switches and it turns out that only one thread can write the file simultaneously.

Correct me if I' wrong.

Solution 6 - Multithreading

No, it makes no sense. At some point, the operations have to be serialized (by the OS). On the other hand, since modern OS's have to cope with multiple processes anyway I doubt that there's an added overhead.

Solution 7 - Multithreading

I'd think it would hinder the operations... You only have one controller and one drive.

You could use a second thread to do the operation, and a main thread that shows an updated UI.

Solution 8 - Multithreading

I think it could worsen the performance, because the multiple threads will compete for the same resources.

You can test the impact of doing concurrent IO operations on the same device by copying a set of files from one place to another and measuring the time, then split the set in two parts and make the copies in parallel... the second option will be sensibly slower.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAidan RyanView Question on Stackoverflow
Solution 1 - MultithreadingjristaView Answer on Stackoverflow
Solution 2 - MultithreadingChrisWView Answer on Stackoverflow
Solution 3 - MultithreadingRobert HarveyView Answer on Stackoverflow
Solution 4 - MultithreadingDaniel EarwickerView Answer on Stackoverflow
Solution 5 - Multithreadingasap diabloView Answer on Stackoverflow
Solution 6 - MultithreadingKonrad RudolphView Answer on Stackoverflow
Solution 7 - MultithreadingOsama Al-MaadeedView Answer on Stackoverflow
Solution 8 - MultithreadingfortranView Answer on Stackoverflow