Get encoding of a file in Windows

WindowsEncoding

Windows Problem Overview


This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?

Windows Solutions


Solution 1 - Windows

Open up your file using regular old vanilla Notepad that comes with Windows.
It will show you the encoding of the file when you click "Save As...".
It'll look like this: enter image description here

Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).

I realize there are many different types of encoding, but this was all I needed when I was informed our export files were in UTF-8 and they required ANSI. It was a onetime export, so Notepad fit the bill for me.

FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicdoe

Solution 2 - Windows

If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:

file *

This will give you the encoding details of all the files in that folder.

Solution 3 - Windows

The (Linux) command-line tool 'file' is available on Windows via GnuWin32:

http://gnuwin32.sourceforge.net/packages/file.htm

If you have git installed, it's located in C:\Program Files\git\usr\bin.

Example:

C:\Users\SH\Downloads\SquareRoot>file *
_UpgradeReport_Files;         directory
Debug;                        directory
duration.h;                   ASCII C++ program text, with CRLF line terminators
ipch;                         directory
main.cpp;                     ASCII C program text, with CRLF line terminators
Precision.txt;                ASCII text, with CRLF line terminators
Release;                      directory
Speed.txt;                    ASCII text, with CRLF line terminators
SquareRoot.sdf;               data
SquareRoot.sln;               UTF-8 Unicode (with BOM) text, with CRLF line terminators
SquareRoot.sln.docstates.suo; PCX ver. 2.5 image data
SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary info
SquareRoot.vcproj;            XML  document text
SquareRoot.vcxproj;           XML document text
SquareRoot.vcxproj.filters;   XML document text
SquareRoot.vcxproj.user;      XML document text
squarerootmethods.h;          ASCII C program text, with CRLF line terminators
UpgradeLog.XML;               XML  document text

C:\Users\SH\Downloads\SquareRoot>file --mime-encoding *
_UpgradeReport_Files;         binary
Debug;                        binary
duration.h;                   us-ascii
ipch;                         binary
main.cpp;                     us-ascii
Precision.txt;                us-ascii
Release;                      binary
Speed.txt;                    us-ascii
SquareRoot.sdf;               binary
SquareRoot.sln;               utf-8
SquareRoot.sln.docstates.suo; binary
SquareRoot.suo;               CDF V2 Document, corrupt: Cannot read summary infobinary
SquareRoot.vcproj;            us-ascii
SquareRoot.vcxproj;           utf-8
SquareRoot.vcxproj.filters;   utf-8
SquareRoot.vcxproj.user;      utf-8
squarerootmethods.h;          us-ascii
UpgradeLog.XML;               us-ascii

Solution 4 - Windows

Another tool that I found useful: https://archive.codeplex.com/?p=encodingchecker EXE can be found here

Solution 5 - Windows

Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).

Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by @Sybren, and I show how to do that via PowerShell in a later answer.

# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   { return 'utf8' }
        '^2b2f76'   { return 'utf7' }
        '^fffe'     { return 'unicode' }
        '^feff'     { return 'bigendianunicode' }
        '^0000feff' { return 'utf32' }
        default     { return 'ascii' }
    }
}

dir ~\Documents\WindowsPowershell -File | 
    select Name,@{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} | 
    ft -AutoSize

Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)

Solution 6 - Windows

Install git ( on Windows you have to use git bash console). Type:

file --mime-encoding *   

for all files in the current directory , or

file --mime-encoding */*   

for the files in all subdirectories

Solution 7 - Windows

A simple solution might be opening the file in Firefox.

  1. Drag and drop the file into firefox
  2. Press Ctrl+I to open the page info

and the text encoding will appear on the "Page Info" window.

enter image description here

Note: If the file is not in txt format, just rename it to txt and try again.

P.S. For more info see this article.

Solution 8 - Windows

I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use @Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).

Add this to your profile.ps1:

$global:gitbin = 'C:\Program Files\Git\usr\bin'
Set-Alias file.exe $gitbin\file.exe

And used like: file.exe --mime-encoding *. You must include .exe in the command for PS alias to work.

But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0 and save it to ~\Documents\WindowsPowerShell. It's safe to use on a computer without git, but will write warnings when git is not found.

The .exe in the command is also how I use C:\WINDOWS\system32\where.exe from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.

Solution 9 - Windows

Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html

>Only ASCII, UTF-8 and encodings using a BOM (UTF-7 with BOM, UTF-8 with BOM, UTF-16, and UTF-32) have reliable algorithms to get the encoding of a document. For all other encodings, you have to trust heuristics based on statistics.

EDIT:

A powershell version of a C# answer from: https://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding. Only works with signatures (boms).

# get-encoding.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)    
begin {
  # set .net current directoy                                                                                                   
  [Environment]::CurrentDirectory = (pwd).path
}
process {
  $reader = [System.IO.StreamReader]::new($filename, 
    [System.Text.Encoding]::default,$true)
  $peek = $reader.Peek()
  $encoding = $reader.currentencoding
  $reader.close()
  [pscustomobject]@{Name=split-path $filename -leaf
                BodyName=$encoding.BodyName
                EncodingName=$encoding.EncodingName}
}


.\get-encoding chinese8.txt

Name         BodyName EncodingName
----         -------- ------------
chinese8.txt utf-8    Unicode (UTF-8)


get-childitem -file | .\get-encoding

Solution 10 - Windows

Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."

The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.

Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).

Solution 11 - Windows

The only way that I have found to do this is VIM or Notepad++.

Solution 12 - Windows

EncodingChecker

File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.

File Encoding Checker requires .NET 4 or above to run.

Solution 13 - Windows

Looking for a Node.js/npm solution? Try encoding-checker:

npm install -g encoding-checker
Usage
Usage: encoding-checker [-p pattern] [-i encoding] [-v]
 
Options:
  --help                 Show help                                     [boolean]
  --version              Show version number                           [boolean]
  --pattern, -p, -d                                               [default: "*"]
  --ignore-encoding, -i                                            [default: ""]
  --verbose, -v                                                 [default: false]
Examples

Get encoding of all files in current directory:

encoding-checker

Return encoding of all md files in current directory:

encoding-checker -p "*.md"

Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):

encoding-checker -p "**"

For more examples refer to the npm docu or the official repository.

Solution 14 - Windows

you can simply check that by opening your git bash on the file location then running the command file -i file_name

example

user filesData
$ file -i data.csv
data.csv: text/csv; charset=utf-8

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTheWebGuyView Question on Stackoverflow
Solution 1 - WindowsMikeTeeVeeView Answer on Stackoverflow
Solution 2 - WindowsGeorge NinanView Answer on Stackoverflow
Solution 3 - WindowsSybrenView Answer on Stackoverflow
Solution 4 - Windowsuser961954View Answer on Stackoverflow
Solution 5 - WindowsyzorgView Answer on Stackoverflow
Solution 6 - Windowsphd_coderView Answer on Stackoverflow
Solution 7 - WindowsJust ShadowView Answer on Stackoverflow
Solution 8 - WindowsyzorgView Answer on Stackoverflow
Solution 9 - Windowsjs2010View Answer on Stackoverflow
Solution 10 - WindowsJaykeBirdView Answer on Stackoverflow
Solution 11 - WindowsTodd PartridgeView Answer on Stackoverflow
Solution 12 - WindowsAmr AliView Answer on Stackoverflow
Solution 13 - WindowsToJoView Answer on Stackoverflow
Solution 14 - WindowsDINA TAKLITView Answer on Stackoverflow