"Line contains NULL byte" in CSV reader (Python)

PythonCsv

Python Problem Overview


I'm trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).

This is what my program looks like right now:

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Unfortunately, I keep getting this error, and I have no clue what it's about.

Traceback (most recent call last):
  File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
    for row in reader:
_csv.Error: line contains NULL byte

Credit to all the people https://stackoverflow.com/questions/7853606/whats-wrong-with-this-python-program-working-on-csv/7853688#7853688">here</a> to even to get me to this point.

Python Solutions


Solution 1 - Python

I've solved a similar problem with an easier solution:

import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))
    

The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.

Solution 2 - Python

I'm guessing you have a NUL byte in input.csv. You can test that with

if '\0' in open('input.csv').read():
    print "you have null bytes in your input file"
else:
    print "you don't"

if you do,

reader = csv.reader(x.replace('\0', '') for x in mycsv)

may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.

Solution 3 - Python

If you want to replace the nulls with something you can do this:

def fix_nulls(s):
    for line in s:
        yield line.replace('\0', ' ')

r = csv.reader(fix_nulls(open(...)))

Solution 4 - Python

You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.

See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'rb') as mycsv:
        reader = csv.reader( (line.replace('\0','') for line in mycsv) )
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Solution 5 - Python

This will tell you what line is the problem.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        try:
            for i, row in enumerate(reader):
                if row[0] not in lines:
                   writer.writerow(row)
        except csv.Error:
            print('csv choked on line %s' % (i+1))
            raise

Perhaps this from daniweb would be helpful:

> I'm getting this error when reading from a csv file: "Runtime Error! > line contains NULL byte". Any idea about the root cause of this error?

...

> Ok, I got it and thought I'd post the solution. Simply yet caused me > grief... Used file was saved in a .xls format instead of a .csv Didn't > catch this because the file name itself had the .csv extension while > the type was still .xls

Solution 6 - Python

A tricky way:

If you develop under Lunux, you can use all the power of sed:

from subprocess import check_call, CalledProcessError

PATH_TO_FILE = '/home/user/some/path/to/file.csv'

try:
    check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
    print(err)    

The most efficient solution for huge files.

Checked for Python3, Kubuntu

Solution 7 - Python

I've recently fixed this issue and in my instance it was a file that was compressed that I was trying to read. Check the file format first. Then check that the contents are what the extension refers to.

Solution 8 - Python

Turning my linux environment into a clean complete UTF-8 environment made the trick for me. Try the following in your command line:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

Solution 9 - Python

This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.

In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn't consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.

The lesson: If you're seeing this error, verify that your data looks the way that you think it does!

Solution 10 - Python

for skipping the NULL byte rows

import csv

with open('sample.csv', newline='') as csv_file:
    reader = csv.reader(csv_file)
    while True:
        try:
            row = next(reader)
            print(row)
        except csv.Error:
            continue
        except StopIteration:
            break

Solution 11 - Python

pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes

data = pd.read_csv(file, encoding='utf-16')

see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Solution 12 - Python

It is very simple.

don't make a csv file by "create new excel" or save as ".csv" from window.

simply import csv module, write a dummy csv file, and then paste your data in that.

csv made by python csv module itself will no longer show you encoding or blank line error.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJames RosemanView Question on Stackoverflow
Solution 1 - PythonK. David C.View Answer on Stackoverflow
Solution 2 - PythonretracileView Answer on Stackoverflow
Solution 3 - PythonClaudiuView Answer on Stackoverflow
Solution 4 - PythonwootView Answer on Stackoverflow
Solution 5 - PythonSteven RumbalskiView Answer on Stackoverflow
Solution 6 - PythonSergOView Answer on Stackoverflow
Solution 7 - PythonDaniel LeeView Answer on Stackoverflow
Solution 8 - PythonPhilippe OgerView Answer on Stackoverflow
Solution 9 - PythonDavid HoelzerView Answer on Stackoverflow
Solution 10 - PythonshrhawkView Answer on Stackoverflow
Solution 11 - PythonSébastien WieckowskiView Answer on Stackoverflow
Solution 12 - Pythonnitish guptaView Answer on Stackoverflow