How to get only the first ten bytes of a binary file

Bash Problem Overview

I am writing a bash script that needs to get the header (first 10 bytes) of a file and then in another section get everything except the first 10 bytes. These are binary files and will likely have \0's and \n's throughout the first 10 bytes. It seems like most utilities work with ASCII files. What is a good way to achieve this task?

Bash Solutions

Solution 1 - Bash

To get the first 10 bytes, as noted already:

head -c 10

To get all but the first 10 bytes (at least with GNU tail):

tail -c+11

Solution 2 - Bash

head -c 10 does the right thing here.

Solution 3 - Bash

You can use the http://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html">`dd`</a> command to copy an arbitrary number of bytes from a binary file.

dd if=infile of=outfile1 bs=10 count=1
dd if=infile of=outfile2 bs=10 skip=1

Solution 4 - Bash

How to split a stream (or a file) under [tag:bash]

Reading SO request:

> get the header (first 10 bytes) of a file and then in another section get everything except the first 10 bytes.

I understand:

> How to split a file at specific point

As all answers here does access same file two time, instead of splitting them like a stream, here is my two cents:

The interesting thing using Un*x is considering the whole job as a filter, it's easy to a split stream using unbuffered I/O. Most of standard un*x tools (cat, grep, awk, sed, python, perl ...) work as filters.

Using `head` but in a single pass

{ head -c 10 >head_part; cat >tail_part;} <file

This is the more efficient, as your file is read only 1 time, the first 10 byte goes to head_part and the rest goes to tail_part.

Note: second redirection >tail_part could be place outside of whole list ({ ...;}) as well...

You could do same, using `dd`:

{ dd count=1 bs=10 of=head_part; cat;} <file >tail_part

This stay more efficient than running two process of dd to open same file two times.

...And still use standard block size for the rest of file:

Another sample based on read by line:

Split HTTP (or mail) stream on near empty line (line containing only carriage return: \r):

nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
    { sed -u '/^\r$/q' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw

or, to drop empty last head line:

nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
    { sed -nu '/^\r$/q;p' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw

This will produce two files:

ls -l so_*.raw
-rw-r--r-- 1 root    root           307 Apr 25 11:40  so_head.raw
-rw-r--r-- 1 root    root           219 Apr 25 11:40  so_body.raw

grep www so_*.raw
so_body.raw:<A HREF="http://www.google.com/">here</A>.
so_head.raw:Location: http://www.google.com/

Pure bash way:

If the goal is to obtain values of first 10 bytes in a usable [tag:bash] variable, here is a nice and efficient way:

Because ten byte are few, fork to head could be avoided. from Read a file by bytes in BASH:

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
    printf -v $_r8_var %02X "'"$_r8_car
}
{ 
    first10=()
    for i in {0..9};do
        read8 first10[i] || break
    done
    cat
 } < "$infile" >"$outfile"

This will create an array ${first10[@]} containing hexadecimal values of first ten bytes of $infile and store rest of data into $outfile.

declare -p first10

declare -a first10=([0]="25" [1]="50" [2]="44" [3]="46" [4]="2D" [5]="31" [6]="2E"
[7]="34" [8]="0A" [9]="25")

This was a PDF (%PDF -> 25 50 44 46)... Here's another sample:

{
    first10=()
    for i in {0..9};do
        read8 first10[i] || break
    done
    cat
} <<<"Hello world!"
d!

As I didn't redirect output, string d! will be output on terminal.

echo ${first10[@]}
48 65 6C 6C 6F 20 77 6F 72 6C

printf '%b%b%b%b%b%b%b%b%b%b\n' ${first10[@]/#/\\x}
Hello worl

About binary

You said:

> These are binary files and will likely have \0's and \n's throughout the first 10 bytes.

{
    first10=()
    for i in {0..9};do
        read8 first10[i] || break
    done
    cat
} < <(gzip <<<"Hello world!") >/dev/null 

echo ${first10[@]}
1F 8B 08 00 00 00 00 00 00 03

( Sample with a \n at bottom of this ;)

As a function

read8() { local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
    printf -v $_r8_var %02X "'"$_r8_car ;}
get10() {
    local -n result=${1:-first10}     # 1st arg is array name
    local -i _i
    result=()
    for ((_i=0;_i<${2:-10};_i++));do  # 2nd arg is number of bytes
        read8 result[_i] || { unset result[_i] ; return 1 ;}
    done
    cat
}

Then (here, I use the special character ⛶ for: there was no newline. ).

get10 pdf 4 <$infile >$outfile
printf %b ${pdf[@]/#/\\x}
%PDF⛶

echo $(( $(stat -c %s $infile) - $(stat -c %s $outfile) ))
4

get10 test 8 <<<'Hello world'
rld!

printf %b ${test[@]/#/\\x}
Hello Wo⛶

get10 test 24 <<<'Hello World!'
printf %b ${test[@]/#/\\x}
Hello World!

( And the last character printed is a \n! ;)

Final binary demo:

get10 test 256 < <(gzip <<<'Hello world!')

printf '%b' ${test[@]/#/\\x} | gunzip 
Hello world!

printf "  %s %s %s %s  %s %s %s %s    %s %s %s %s  %s %s %s %s\n" ${test[@]}
  1F 8B 08 00  00 00 00 00    00 03 F3 48  CD C9 C9 57
  28 CF 2F CA  49 51 E4 02    00 41 E4 A9  B2 0D 00 00
  00

Note!! This work fine and is very quick while number of byte to read stay low, even processing large files. This could be used for file recognition, for sample. But for spliting files on larger parts, you have to use split, head, tail and/or dd.

Content Type	Original Author	Original Content on Stackoverflow
Question	User1	View Question on Stackoverflow
Solution 1 - Bash	psmears	View Answer on Stackoverflow
Solution 2 - Bash	moonshadow	View Answer on Stackoverflow
Solution 3 - Bash	Mark Ransom	View Answer on Stackoverflow
Solution 4 - Bash	F. Hauri	View Answer on Stackoverflow

How to get only the first ten bytes of a binary file

Bash Problem Overview

Bash Solutions

Solution 1 - Bash

Solution 2 - Bash

Solution 3 - Bash

Solution 4 - Bash

How to split a stream (or a file) under [tag:bash]

Using `head` but in a single pass

You could do same, using `dd`:

Another sample based on read by line:

Pure bash way:

About binary

As a function

Final binary demo:

Is Go subject to the same subtle memory-leaks that Java is?

Narrowing conversions in C++0x. Is it just me, or does this sound like a breaking change?

Attributions

Bash Problem Overview

Bash Solutions

Solution 1 - Bash

Solution 2 - Bash

Solution 3 - Bash

Solution 4 - Bash

How to split a stream (or a file) under [tag:bash]

Using head but in a single pass

You could do same, using dd:

Another sample based on read by line:

Pure bash way:

About binary

As a function

Final binary demo:

Is Go subject to the same subtle memory-leaks that Java is?

Narrowing conversions in C++0x. Is it just me, or does this sound like a breaking change?

Attributions

Using `head` but in a single pass

You could do same, using `dd`: