Identifying and removing null characters in UNIX

UnixShellNullSpecial Characters

Unix Problem Overview


I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^@ symbols, interleaved in normal text. How can I:

  1. Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.

  2. Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

Unix Solutions


Solution 1 - Unix

I’d use tr:

tr < file-with-nulls -d '\000' > file-without-nulls

If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.

Solution 2 - Unix

Use the following sed command for removing the null characters in a file.

sed -i 's/\x0//g' null.txt

this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.

Solution 3 - Unix

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

Solution 4 - Unix

I discovered the following, which prints out which lines, if any, have null characters:

perl -ne '/\000/ and print;' file-with-nulls

Also, an octal dump can tell you if there are nulls:

od file-with-nulls | grep ' 000'

Solution 5 - Unix

If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.

tr -d '\n\000' <infile | tr '\r' '\n' >outfile

Solution 6 - Unix

Here is example how to remove NULL characters using ex (in-place):

ex -s +"%s/\%x00//g" -cwq nulls.txt

and for multiple files:

ex -s +'bufdo!%s/\%x00//g' -cxa *.txt

For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).

Useful for scripting since sed and its -i parameter is a non-standard BSD extension.

See also: How to check if the file is a binary file and read all the files which are not?

Solution 7 - Unix

I used:

recode UTF-16..UTF-8 <filename>

to get rid of zeroes in file.

Solution 8 - Unix

I faced the same error with:

import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')

I solved the problem by changing the encoding to utf-16

f=cd.open(filePath,'r','utf-16')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondogbaneView Question on Stackoverflow
Solution 1 - UnixPointyView Answer on Stackoverflow
Solution 2 - Unixrekha_sriView Answer on Stackoverflow
Solution 3 - UnixIgnacio Vazquez-AbramsView Answer on Stackoverflow
Solution 4 - UnixdogbaneView Answer on Stackoverflow
Solution 5 - UnixwwmbesView Answer on Stackoverflow
Solution 6 - UnixkenorbView Answer on Stackoverflow
Solution 7 - UnixlogisecView Answer on Stackoverflow
Solution 8 - UnixMing YoungView Answer on Stackoverflow