How do I extract a single chunk of bytes from within a file?

LinuxFileSplit

Linux Problem Overview


On a Linux desktop (RHEL4) I want to extract a range of bytes (typically less than 1000) from within a large file (>1 Gig). I know the offset into the file and the size of the chunk.

I can write code to do this but is there a command line solution?

Ideally, something like:

magicprogram --offset 102567 --size 253 < input.binary > output.binary

Linux Solutions


Solution 1 - Linux

Try dd:

dd skip=102567 count=253 if=input.binary of=output.binary bs=1

The option bs=1 sets the block size, making dd read and write one byte at a time. The default block size is 512 bytes.

The value of bs also affects the behavior of skip and count since the numbers in skip and count are the numbers of blocks that dd will skip and read/write, respectively.

Solution 2 - Linux

This is an old question, but I'd like to add another version of the dd command that is better-suited for large chunks of bytes:

dd if=input.binary of=output.binary skip=$offset count=$bytes iflag=skip_bytes,count_bytes

where $offset and $bytes are numbers in byte units.

The difference with Thomas's accepted answer is that bs=1 does not appear here. bs=1 sets the input and output block size to 1 byte, which makes it terribly slow when the number of bytes to extract is large.

This means we leave the block size (bs) at its default of 512 bytes. Using iflag=skip_bytes,count_bytes, we tell dd to treat the values after skip and count as byte amount instead of block amount.

Solution 3 - Linux

head -c + tail -c

Not sure how it compares to dd in efficiency, but it is fun:

printf "123456789" | tail -c+2 | head -c3

picks 3 bytes, starting at the 2nd one:

234

See also:

Solution 4 - Linux

Even faster

dd bs=<req len> count=1 skip=<req offset> if=input.binary of=output.binary 

Solution 5 - Linux

The dd command can do all of this. Look at the seek and/or skip parameters as part of the call.

Solution 6 - Linux

I have had the same problem, trying to cut parts of a RAW disk image. dd with bs=1 is unusable, therefore I have made a simple C program for the task.

// usage:
//  ./cutfile srcfile destfile offset length
//  ./cutfile my.image movie.avi 4524 20412452
// compile, presuming it is saved as cutfile.cc:
//  gcc cutfile.cc -o cutfile -std=c11 -pedantic -W -Wall -Werror 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[])
{
  if(argc != 5) {
      printf("error, need 4 arguments!\n");
      return 1;
  }


  const unsigned blocksize = 16*512;  // can adjust
  unsigned char buffer[blocksize];

  FILE *f = fopen(argv[1], "rb");
  FILE *fout = fopen(argv[2], "wb");
  long offset = atol(argv[3]);
  long length = atol(argv[4]);
  if(f==NULL || fout==NULL) {
      perror("cannot open file");
      return 1;
  }
  fseek(f, offset, SEEK_SET);

  while(length > blocksize) {
      fread(buffer, 1, blocksize, f);
      fwrite(buffer, 1, blocksize, fout);
      length -= blocksize;
  }
  if(length>0) { // copy rest
      fread(buffer, 1, length, f);
      fwrite(buffer, 1, length, fout);
  }    

  fclose(fout);
  fclose(f);
  return 0;
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDanMView Question on Stackoverflow
Solution 1 - LinuxThomas Padron-McCarthyView Answer on Stackoverflow
Solution 2 - LinuxChronoTriggerView Answer on Stackoverflow
Solution 3 - LinuxCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 4 - LinuxAlbert BurbeaView Answer on Stackoverflow
Solution 5 - LinuxJoeView Answer on Stackoverflow
Solution 6 - Linuxkarna7View Answer on Stackoverflow