Gzip with all cores

LinuxBashGzip

Linux Problem Overview


I have a set of servers filled each with a bunch of files that can be gzipped. The servers all have different numbers of cores. How can I write a bash script to launch a gzip for each core and make sure the gzips are not zipping the same file?

Linux Solutions


Solution 1 - Linux

There is an implementation of gzip that is multithreaded, pigz. Since it is compressing one file on multiple threads, it should be able to read from disk more efficiently, compared to compressing multiple files at once.

Solution 2 - Linux

If you are on Linux, you can use GNU's xargs to launch as many processes as you have cores.

CORES=$(grep -c '^processor' /proc/cpuinfo)
find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip -9
  • find -print0 / xargs -0 protects you from whitespace in filenames
  • xargs -n 1 means one gzip process per file
  • xargs -P specifies the number of jobs
  • gzip -9 means maximum compression

Solution 3 - Linux

You might want to consider checking GNU parallel. I also found this video on youtube which seems to do what you are looking for.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionUser1View Question on Stackoverflow
Solution 1 - LinuxDavid YawView Answer on Stackoverflow
Solution 2 - LinuxDemosthenexView Answer on Stackoverflow
Solution 3 - LinuxGangadharView Answer on Stackoverflow