How do I extract a single chunk of bytes from within a file?
LinuxFileSplitLinux Problem Overview
On a Linux desktop (RHEL4) I want to extract a range of bytes (typically less than 1000) from within a large file (>1 Gig). I know the offset into the file and the size of the chunk.
I can write code to do this but is there a command line solution?
Ideally, something like:
magicprogram --offset 102567 --size 253 < input.binary > output.binary
Linux Solutions
Solution 1 - Linux
Try dd
:
dd skip=102567 count=253 if=input.binary of=output.binary bs=1
The option bs=1
sets the block size, making dd
read and write one byte at a time. The default block size is 512 bytes.
The value of bs
also affects the behavior of skip
and count
since the numbers in skip
and count
are the numbers of blocks that dd
will skip and read/write, respectively.
Solution 2 - Linux
This is an old question, but I'd like to add another version of the dd
command that is better-suited for large chunks of bytes:
dd if=input.binary of=output.binary skip=$offset count=$bytes iflag=skip_bytes,count_bytes
where $offset
and $bytes
are numbers in byte units.
The difference with Thomas's accepted answer is that bs=1
does not appear here. bs=1
sets the input and output block size to 1 byte, which makes it terribly slow when the number of bytes to extract is large.
This means we leave the block size (bs
) at its default of 512 bytes. Using iflag=skip_bytes,count_bytes
, we tell dd
to treat the values after skip
and count
as byte amount instead of block amount.
Solution 3 - Linux
head -c
+ tail -c
Not sure how it compares to dd
in efficiency, but it is fun:
printf "123456789" | tail -c+2 | head -c3
picks 3 bytes, starting at the 2nd one:
234
See also:
Solution 4 - Linux
Even faster
dd bs=<req len> count=1 skip=<req offset> if=input.binary of=output.binary
Solution 5 - Linux
The dd command can do all of this. Look at the seek and/or skip parameters as part of the call.
Solution 6 - Linux
I have had the same problem, trying to cut parts of a RAW disk image. dd with bs=1 is unusable, therefore I have made a simple C program for the task.
// usage:
// ./cutfile srcfile destfile offset length
// ./cutfile my.image movie.avi 4524 20412452
// compile, presuming it is saved as cutfile.cc:
// gcc cutfile.cc -o cutfile -std=c11 -pedantic -W -Wall -Werror
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
if(argc != 5) {
printf("error, need 4 arguments!\n");
return 1;
}
const unsigned blocksize = 16*512; // can adjust
unsigned char buffer[blocksize];
FILE *f = fopen(argv[1], "rb");
FILE *fout = fopen(argv[2], "wb");
long offset = atol(argv[3]);
long length = atol(argv[4]);
if(f==NULL || fout==NULL) {
perror("cannot open file");
return 1;
}
fseek(f, offset, SEEK_SET);
while(length > blocksize) {
fread(buffer, 1, blocksize, f);
fwrite(buffer, 1, blocksize, fout);
length -= blocksize;
}
if(length>0) { // copy rest
fread(buffer, 1, length, f);
fwrite(buffer, 1, length, fout);
}
fclose(fout);
fclose(f);
return 0;
}