How to determine encoding table of a text file

TextUnicodeEncodingCharacter Encoding

Text Problem Overview


I have .txt and .java files and I don't know how to determine the encoding table of the files (Unicode, UTF-8, ISO-8525, …). Does there exist any program to determine the file encoding or to see the encoding?

Text Solutions


Solution 1 - Text

If you're on Linux, try file -i filename.txt.

$ file -i vol34.tex 
vol34.tex: text/x-tex; charset=us-ascii

For reference, here is my environment:

$ which file
/usr/bin/file
$ file --version
file-5.09
magic file from /etc/magic:/usr/share/misc/magic

Some file versions (e.g. file-5.04 on OS X/macOS) have slightly different command-line switches:

$ file -I vol34.tex 
vol34.tex: text/x-tex; charset=us-ascii
$ file --mime vol34.tex
vol34.tex: text/x-tex; charset=us-ascii

Also, have a look here.

Solution 2 - Text

Open the file with Notepad++ and will see on the right down corner the encoding table name. And in the menu encoding you can change the encoding table and save the file.

Solution 3 - Text

You can't reliably detect the encoding from a textfile - what you can do is make an educated guess by searching for a non-ascii char and trying to determine if it is a unicode combination that makes sens in the languages you are parsing.

Solution 4 - Text

See this question and the selected answer. There’s no sure-fire way of doing it. At most, you can rule things out. The UTF encodings you’re unlikely to get false positives on, but the 8-bit encodings are tough, especially if you don’t know the starting language. No tool out there currently handles all the common 8-bit encodings from Macs, Windows, Unix, but the selected answer provides an algorithmic approach that should work adequately for a certain subset of encodings.

Solution 5 - Text

In a text file there is no header that saves the encoding or so. You can try the linux/unix command find which tries to guess the encoding:

file -i unreadablefile.txt

or on some systems

file -I unreadablefile.txt

But that often gives you text/plain; charset=iso-8859-1 although the file is unreadable (cryptic glyphs).

This is what I did to find the correct file encoding for an unreadable file and then translate it to utf8 was, after installing iconv. First I tried all encodings, displaying (grep) a line that contained the word www. (a website address):

for ENCODING in $(iconv -l); do echo -n "$ENCODING "; iconv -f $ENCODING -t utf-8 unreadablefile.txt 2>/dev/null| grep 'www'; done | less

This last commandline shows the the tested file encoding and then the translated/transcoded line.

There were some lines which showed readable and consistent (one language at a time) results. I tried manually some of them, for example:

ENCODING=WINDOWS-936; iconv -f $ENCODING -t utf-8 unreadablefile.txt -o test_with_${ENCODING}.txt

In my case it was a chinese windows encoding, which is now readable (if you know chinese).

Solution 6 - Text

>Does there exist any program to determine the file encoding or to see the encoding?

This question is 10 years old as I write this, and the answer is still, "No" - at least not reliably. There's not been much improvement unfortunately. My recent experience suggests the file -I command is very much "hit-or-miss". For example, when checking a text file on macOS 10.15.6:

% file -i somefile.asc
somefile.asc: application/octet-stream; charset=binary

somefile.asc was a text file. All charcters in it were encoded in UTF-16 Little Endian. How did I know this? I used BBedit - a competent text editor. Determining the encoding used in a file is certainly a tough problem, but...?

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBallonView Question on Stackoverflow
Solution 1 - TextmpenkovView Answer on Stackoverflow
Solution 2 - TextBallonView Answer on Stackoverflow
Solution 3 - TextNikolaus GradwohlView Answer on Stackoverflow
Solution 4 - TexttchristView Answer on Stackoverflow
Solution 5 - TexterikView Answer on Stackoverflow
Solution 6 - TextSeamusView Answer on Stackoverflow