Split files using tar, gz, zip, or bzip2

LinuxBashFile IoCompression

Linux Problem Overview


I need to compress a large file of about 17-20 GB. I need to split it into several files of around 1GB per file.

I searched for a solution via Google and found ways using split and cat commands. But they did not work for large files at all. Also, they won't work in Windows; I need to extract it on a Windows machine.

Linux Solutions


Solution 1 - Linux

You can use the split command with the -b option:

split -b 1024m file.tar.gz

It can be reassembled on a Windows machine using @Joshua's answer.

copy /b file1 + file2 + file3 + file4 filetogether

Edit: As @Charlie stated in the comment below, you might want to set a prefix explicitly because it will use x otherwise, which can be confusing.

split -b 1024m "file.tar.gz" "file.tar.gz.part-"

// Creates files: file.tar.gz.part-aa, file.tar.gz.part-ab, file.tar.gz.part-ac, ...

Edit: Editing the post because question is closed and the most effective solution is very close to the content of this answer:

# create archives
$ tar cz my_large_file_1 my_large_file_2 | split -b 1024MiB - myfiles_split.tgz_
# uncompress
$ cat myfiles_split.tgz_* | tar xz

This solution avoids the need to use an intermediate large file when (de)compressing. Use the tar -C option to use a different directory for the resulting files. btw if the archive consists from only a single file, tar could be avoided and only gzip used:

# create archives
$ gzip -c my_large_file | split -b 1024MiB - myfile_split.gz_
# uncompress
$ cat myfile_split.gz_* | gunzip -c > my_large_file

For windows you can download ported versions of the same commands or use cygwin.

Solution 2 - Linux

If you are splitting from Linux, you can still reassemble in Windows.

copy /b file1 + file2 + file3 + file4 filetogether

Solution 3 - Linux

use tar to split into multiple archives

there are plenty of programs that will work with tar files on windows, including cygwin.

Solution 4 - Linux

Tested code, initially creates a single archive file, then splits it:

 gzip -c file.orig > file.gz
 CHUNKSIZE=1073741824
 PARTCNT=$[$(stat -c%s file.gz) / $CHUNKSIZE]

 # the remainder is taken care of, for example for
 # 1 GiB + 1 bytes PARTCNT is 1 and seq 0 $PARTCNT covers
 # all of file
 for n in `seq 0 $PARTCNT`
 do
       dd if=file.gz of=part.$n bs=$CHUNKSIZE skip=$n count=1
 done

This variant omits creating a single archive file and goes straight to creating parts:

gzip -c file.orig |
    ( CHUNKSIZE=1073741824;
        i=0;
        while true; do
            i=$[i+1];
            head -c "$CHUNKSIZE" > "part.$i";
            [ "$CHUNKSIZE" -eq $(stat -c%s "part.$i") ] || break;
        done; )

In this variant, if the archive's file size is divisible by $CHUNKSIZE, then the last partial file will have file size 0 bytes.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAkaView Question on Stackoverflow
Solution 1 - LinuxmatpieView Answer on Stackoverflow
Solution 2 - LinuxJoshuaView Answer on Stackoverflow
Solution 3 - LinuxTim HoolihanView Answer on Stackoverflow
Solution 4 - LinuxAdrian PanasiukView Answer on Stackoverflow