How to check encoding of a CSV file

CsvEncoding

Csv Problem Overview


I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it

OR do I need to make use of programming languages like C# or PHP to deduce it.

Csv Solutions


Solution 1 - Csv

You can use Notepad++ to evaluate a file's encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

Solution 2 - Csv

In Linux systems, you can use file command. It will give the correct encoding

Sample:

file blah.csv

Output:

blah.csv: ISO-8859 text, with very long lines

Solution 3 - Csv

If you use Python, just use a print() function to check the encoding of a csv file. For example:

with open('file_name.csv') as f:
    print(f)

The output is something like this:

<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>

Solution 4 - Csv

Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).

Install python, then pip install chardet, at last use the command line command.

I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).

file is not reliable as you can see.

enter image description here

Solution 5 - Csv

You can also use python chardet library

# install the chardet library
!pip install chardet

# import the chardet library
import chardet 

# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
    print(chardet.detect(file.read()))

Solution 6 - Csv

Or you can execute in python console or in Jupyter Notebook:

import csv
data = open("file.csv","r") 
data

You will see information about the data object like this:

<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>

As you can see it contains encoding infotmation.

Solution 7 - Csv

In Python, You can Try...

from encodings.aliases import aliases
alias_values = set(aliases.values())

for encoding in set(aliases.values()):
    try:
        df=pd.read_csv("test.csv", encoding=encoding)
        print('successful', encoding)
    except:
        pass

Solution 8 - Csv

CSV files have no headers indicating the encoding.

You can only guess by looking at:

  • the platform / application the file was created on
  • the bytes in the file

In 2021, emoticons are widely used, but many import tools fail to import them. The chardet library is often recommended in the answers above, but the lib does not handle emoticons well.

icecream = '🍦'

import csv

with open('test.csv', 'w') as f:
    wf = csv.writer(f)
    wf.writerow(['ice cream', icecream])


import chardet
with open('test.csv', 'rb') as f:
    print(chardet.detect(f.read()))

{'encoding': 'Windows-1254', 'confidence': 0.3864823918622268, 'language': 'Turkish'}

This gives UnicodeDecodeError while trying to read the file with this encoding.

The default encoding on Mac is UTF-8. It's included explicitly here but that wasn't even necessary... but on Windows it might be.

with open('test.csv', 'r', encoding='utf-8') as f:
    print(f.read())

ice cream,🍦

The file command also picked this up

file test.csv
test.csv: UTF-8 Unicode text, with CRLF line terminators

My advice in 2021, if the automatic detection goes wrong: try UTF-8 before resorting to chardet.

Solution 9 - Csv

As it is mentioned by @3724913 (Jitender Kumar) to use file command (it also works in WSL on Windows), I was able to get encoding information of a csv file by executing file --exclude encoding blah.csv using info available on man file as file blah.csv won't show the encoding info on my system.

Solution 10 - Csv

Just add the encoding argument that matches the file you`re trying to upload.

open('example.csv', encoding='UTF8')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVipulView Question on Stackoverflow
Solution 1 - CsvCamWView Answer on Stackoverflow
Solution 2 - CsvJitender KumarView Answer on Stackoverflow
Solution 3 - CsvAlineatView Answer on Stackoverflow
Solution 4 - CsvRickView Answer on Stackoverflow
Solution 5 - CsvMd Kaish AnsariView Answer on Stackoverflow
Solution 6 - CsvNemrodDevView Answer on Stackoverflow
Solution 7 - CsvMd Kaish AnsariView Answer on Stackoverflow
Solution 8 - CsvStefaan GhyselsView Answer on Stackoverflow
Solution 9 - CsvShobeiraView Answer on Stackoverflow
Solution 10 - CsvAlaniView Answer on Stackoverflow