How do I split a file into n no of parts

FileSplit

File Problem Overview


I have a file contining some no of lines. I want split file into n no.of files with particular names. It doesn't matter how many line present in each file. I just want particular no.of files (say 5). here the problem is the no of lines in the original file keep on changing. So I need to calculate no of lines then just split the files into 5 parts. If possible we have to send each of them into different directories.

File Solutions


Solution 1 - File

In bash, you can use the split command to split it based on number of lines desired. You can use wc command to figure out how many lines are desired. Here's wc combined with with split into one line.

For example, to split onepiece.log into 5 parts

    split -l$((`wc -l < onepiece.log`/5)) onepiece.log onepiece.split.log -da 4

This will create files like onepiece.split.log0000 ...

Note: bash division rounds down, so if there is a remainder there will be a 6th part file.

Solution 2 - File

On linux, there is a split command,

split --lines=1m /path/to/large/file /path/to/output/file/prefix

> Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.

> ...

>-l, --lines=NUMBER put NUMBER lines per output file

> ...

You would have to calculate the actual size of the splits beforehand, though.

Solution 3 - File

split has an option "--number=CHUNKS" that lets you divide a file into a number of chunks. This is from the (trimmed) output of "split --help":

  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below

...

CHUNKS may be:
N       split into N files based on size of input
K/N     output Kth of N to stdout
l/N     split into N files without splitting lines
l/K/N   output Kth of N to stdout without splitting lines
r/N     like 'l' but use round robin distribution
r/K/N   likewise but only output Kth of N to stdout

In the case of splitting it into 5 parts, the command would be: split --number=l/5 inputfile outputprefix

This might not result in them having the same number of lines, though.

If you want them all to have the same number of lines up until the last one, you can use the following command: split -l $(( ($(cat "inputfile" | wc -l) + 5 - 1)/5 )) inputfile outputprefix Both 5s here can be replaced with any other number (making sure they're the same).

Here's an explanation of this command piece by piece:

$( ) returns the output of whatever command you put into it. cat is used here to make sure wc only returns the number of lines without also outputting the input filename.

$(( )) evaluates whatever you put between the parentheses as a mathematical expression (using only integers) and returns the result.

($(cat "inputfile" | wc -l) + 5 - 1)/5 takes the line count of the input file and adds 5, subtracts 1, and divides the result by 5. The addition and subtraction before division makes sure the result is rounded up so that it gives exactly the number of parts you want (5 in this case).

You can also use split --number=r/5 to split it into four files where each line is distributed between them as in the following example:

inputfile.txt:
1
2
3
4
5
6
7
8
9

outputfile1:
1
6

outputfile2:
2
7

outputfile3:
3
8

outputfile4:
4
9

outputfile5:
5

This doesn't preserve the file order. but it can be useful in cases where that isn't important.

Solution 4 - File

Assuming you are processing a text file then wc -l to determine the total number of lines and split -l to split into a specified number of lines (total / 5 in your case). This works on UNIX/Mac and Windows (if you have cygwin installed)

Solution 5 - File

This is building on the original answers given by @sketchytechky and @grasshopper. If you would like to deal with remainders differently and want a fixed number of files as output but with a round robin distribution of lines, then the split command should be written as:

split -da 4 -n r/1024 filename filename_split --additional-suffix=".log". Replace 1024 with the number of files you want as output.

Solution 6 - File

On macOS you can simply do:

split -n <number_of_parts> <filename>

For example, you can do

split -n 5 file.txt

And it will be split in 5 files with similar number of lines.

Solution 7 - File

I can think of a few ways to do it. Which you would use depends a lot on the data.

  1. Lines are fixed length: Find the size of the file by reading it's directory entry and divide by the line length to get the number of lines. Use this to determine how many lines per file.

  2. The files only need to have approximately the same number of lines. Again read the file size from the directory entry. Read the first N lines (N should be small but some reasonable fraction of the file) to calculate an average line length. Calculate the approximate number of lines based on the file size and predicted average line length. This assumes that the line length follows a normal distribution. If not, adjust your method to randomly sample lines (using seek() or something similar). Rewind the file after your have your average, then split it based on the predicted line length.

  3. Read the file twice. The first time count the number of lines. The second time splitting the file into the requisite pieces.

EDIT: Using a shell script (according to your comments), the randomized version of #2 would be hard unless you wrote a small program to do that for you. You should be able to use ls -l to get the file size, wc -l to count the exact number of lines, and head -nNNN | wc -c to calculate the average line length.

Solution 8 - File

here's a oneliner with variables

file=onepiece.log; nsplit=5; len=$(wc -l < $file); split -l$(($len/$nsplit)) "$file" "$file.split" -da 4

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionnew personView Question on Stackoverflow
Solution 1 - FilesketchytechkyView Answer on Stackoverflow
Solution 2 - FilemikuView Answer on Stackoverflow
Solution 3 - FileHaroldView Answer on Stackoverflow
Solution 4 - FilebjgView Answer on Stackoverflow
Solution 5 - FileVishnuView Answer on Stackoverflow
Solution 6 - FileLuca Di LielloView Answer on Stackoverflow
Solution 7 - FiletvanfossonView Answer on Stackoverflow
Solution 8 - FileFarisHijaziView Answer on Stackoverflow