Excel to CSV with UTF8 encoding

ExcelEncodingCsvUtf 8

Excel Problem Overview


I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.

Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.

Excel Solutions


Solution 1 - Excel

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.

NOTE: Google Sheets does have limitations when importing. See here.

NOTE: Be careful of sensitive data with Google Sheets.

EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.

Solution 2 - Excel

I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.

In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.

calc save dialog

Solution 3 - Excel

  1. Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.

  2. (optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".

  3. Import your *.txt file in the target application. Make sure it can accept UTF16 format.

If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.

I use this procedure to import data from Excel to Moodle.

Solution 4 - Excel

I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.

Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.

Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.

Please give it a try and see if it works for you. Good luck.

Solution 5 - Excel

You can use iconv command under Unix (also available on Windows as libiconv).

After saving as CSV under Excel in the command line put:

iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv

(remember to replace cp1250 with your encoding).

Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).

Solution 6 - Excel

You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.

1. Save from Excel

In Excel, save the data to file.txt using the type Unicode Text (*.txt).

2. Start PowerShell

Run powershell from the Start menu.

3. Load the file in PowerShell

$data = Import-Csv C:\path\to\file.txt -Delimiter "`t" -Encoding BigEndianUnicode

4. Save the data as CSV

$data | Export-Csv file.csv -Encoding UTF8 -NoTypeInformation

Solution 7 - Excel

The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.

  1. Open an Excel file where you have the info (.xls, .xlsx)
  2. In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
  3. In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
  4. Then choose -> Save As... and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.

This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.

This answer is taken from this forum.

Solution 8 - Excel

Another one I've found useful: "Numbers" allows encoding-settings when saving as CSV.

Solution 9 - Excel

Using Notepad++

This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.

  • Export CSV from Excel
  • Load into Notepad++
  • Fix encoding
  • Save

Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select

Encoding > Character Sets > Western European > Windows-1252

Then

Encoding > Convert to UTF-8
File > Save

First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be into . If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.

Solution 10 - Excel

"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.

But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.

Here's a recipe:

  • On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
  • From the "File" menu choose "Import"
  • Click "Choose File"
  • Choose "Replace spreadsheet"
  • Choose whichever character you are using as a Separator
  • Click "Import"
  • From the "File" menu choose "Download as" -> CSV (current sheet)

The resulting file will be in UTF-8

Solution 11 - Excel

Under Excel 2016 and up (including Office 365), there is a CSV option dedicated to the UTF-8 format.

In Office 365, do Save As; where previously one might have chosen CSV (Comma Delimited), now one of the file types you can save as is CSV UTF-8 (Comma delimited) (*.csv)

Solution 12 - Excel

For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.

Install catdoc:

apt-get install catdoc

Do the conversion:

xls2csv -d utf-8 file.xls > file-utf-8.csv 

This is blazing fast.

Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.

Note that xls2csv also only works with .xls files, it does not work with .xlsx files.

Solution 13 - Excel

What about using Powershell.

Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8

Solution 14 - Excel

Easiest way: No need Open office and google docs

  1. Save your file as "Unicode text file";
  2. now you have an unicode text file
  3. open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
  4. rename file extension from "txt" to "csv". This will result in a tab-delimited UTF-8 csv file.
  5. If you want a comma-delimited file, open the csv file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then click Ctrl+H. In the window that opens, type a comma , in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.

Don't open it with MS-Office anyway!!! Now you have a tab delimited CSV file. Or, a comma-delimited one if you applied step number 5.

Solution 15 - Excel

As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.

Solution 16 - Excel

I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.

So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.

Solution is described here: http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html

Solution 17 - Excel

Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.

Solution 18 - Excel

Easy way to do it: download open office ([here][1]), load the spreadsheet and open the excel file (.xls or .xlsx). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1) and the file works just fine. If you select Unicode (UTF-8), it is not going to work with the spanish characters.

[1]: http://www.openoffice.org/ "here"

Solution 19 - Excel

  1. Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)

  2. Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv

Solution 20 - Excel

I have also came across the same problem but there is an easy solution for this.

  1. Open your xlsx file in Excel 2016 or higher.
  2. In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"

It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.

Solution 21 - Excel

Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.

Solution 22 - Excel

I have written a small Python script that can export worksheets in UTF-8.

You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.

#!/usr/bin/env python

# export data sheets from xlsx to csv

from openpyxl import load_workbook
import csv
from os import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def get_all_sheets(excel_file):
    sheets = []
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    all_worksheets = workbook.get_sheet_names()
    for worksheet_name in all_worksheets:
        sheets.append(worksheet_name)
    return sheets

def csv_from_excel(excel_file, sheets):
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    for worksheet_name in sheets:
        print("Export " + worksheet_name + " ...")

        try:
            worksheet = workbook.get_sheet_by_name(worksheet_name)
        except KeyError:
            print("Could not find " + worksheet_name)
            sys.exit(1)

        your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
    	for row in worksheet.iter_rows():
    		lrow = []
    		for cell in row:
    			lrow.append(cell.value)
    		wr.writerow(lrow)
        print(" ... done")
	your_csv_file.close()

if not 2 <= len(sys.argv) <= 3:
	print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
	sys.exit(1)
else:
    sheets = []
    if len(sys.argv) == 3:
        sheets = list(sys.argv[2].split(','))
    else:
        sheets = get_all_sheets(sys.argv[1])
    assert(sheets != None and len(sheets) > 0)
    csv_from_excel(sys.argv[1], sheets)

Solution 23 - Excel

Excel typically saves a csv file as ANSI encoding instead of utf8.

One option to correct the file is to use Notepad or Notepad++:

  1. Open the .csv with Notepad or Notepad++.
  2. Copy the contents to your computer clipboard.
  3. Delete the contents from the file.
  4. Change the encoding of the file to utf8.
  5. Paste the contents back from the clipboard.
  6. Save the file.

Solution 24 - Excel

A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.

Choose in the top menu : Encoding -> Convert to Ansi

Solution 25 - Excel

Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.

There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.

Solution 26 - Excel

Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.

enter image description here

Solution 27 - Excel

open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI else - find out what is your current encoding

Solution 28 - Excel

another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA

Solution 29 - Excel

Save Dialog > Tools Button > Web Options > Encoding Tab

Solution 30 - Excel

I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.

Solution 31 - Excel

I needed to automate this process on my Mac. I originally tried using catdoc/xls2csv as suggested by mpowered, but xls2csv had trouble detecting the original encoding of the document and not all documents were the same. What I ended up doing was setting the default webpage output encoding to be UTF-8 and then providing the files to Apple's Automator, applying the Convert Format of Excel Files action to convert to Web Page (HTML). Then using PHP, DOMDocument and XPath, I queried the documents and formatted them to CSV.

This is the PHP script (process.php):

<?php
$pi = pathinfo($argv[1]);
$file = $pi['dirname'] . '/' . $pi['filename'] . '.csv';
$fp = fopen($file,'w+');
$doc = new DOMDocument;
$doc->loadHTMLFile($argv[1]);
$xpath = new DOMXPath($doc);
$table = [];
foreach($xpath->query('//tr') as $row){
	$_r = [];
	foreach($xpath->query('td',$row) as $col){
		$_r[] = trim($col->textContent);
	}
	fputcsv($fp,$_r);
}
fclose($fp);
?>

And this is the shell command I used to convert the HTML documents to csv:

find . -name '*.htm' | xargs -I{} php ./process.php {}

This is a really, really roundabout way of doing this, but it was the most reliable method that I found.

Solution 32 - Excel

Another way is to open the UTF-8 CSV file in Notepad where it will be displayed correctly. Then replace all the "," with tabs. Paste all of this into a new excel file.

Solution 33 - Excel

And for those who have sublime text: save with encoding utf-16 LE with BOM should do it ;-)

Solution 34 - Excel

(On a Mac:) From Excel save as CSV file. Open the CSV file in TextWrangler (it’s free) and use "save as". In the save dialog choose Unicode (UTF-8). Done

(I guess you can do this with TextEdit also - if you play with the open and save settings. Try Open files: auto, Save files: UTF-8)

Solution 35 - Excel

I used the following solution: Mac Exel 2008 > file > Save-as and then under format use MS_DOS Comma Separated (.csv). Worked perfect.

Solution 36 - Excel

You can save excel as unicode text, it is tab-delimited.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJeff TreutingView Question on Stackoverflow
Solution 1 - Excelnevets1219View Answer on Stackoverflow
Solution 2 - ExcelaendraView Answer on Stackoverflow
Solution 3 - ExcelelomageView Answer on Stackoverflow
Solution 4 - ExcelEricView Answer on Stackoverflow
Solution 5 - ExcelpmilewskiView Answer on Stackoverflow
Solution 6 - ExcelDon CruickshankView Answer on Stackoverflow
Solution 7 - ExcelNickView Answer on Stackoverflow
Solution 8 - ExcelleanderView Answer on Stackoverflow
Solution 9 - ExcelChloeView Answer on Stackoverflow
Solution 10 - ExcelRedYetiView Answer on Stackoverflow
Solution 11 - ExcelNolmë InformatiqueView Answer on Stackoverflow
Solution 12 - ExcelmpoweredView Answer on Stackoverflow
Solution 13 - ExcelMichael SchauView Answer on Stackoverflow
Solution 14 - ExcelSolivanView Answer on Stackoverflow
Solution 15 - ExceloscaroscarView Answer on Stackoverflow
Solution 16 - ExcelanroyView Answer on Stackoverflow
Solution 17 - Excelspring_chickenView Answer on Stackoverflow
Solution 18 - ExcelYessusView Answer on Stackoverflow
Solution 19 - ExcelMenaView Answer on Stackoverflow
Solution 20 - ExcelKrishView Answer on Stackoverflow
Solution 21 - ExcelSilent SojournerView Answer on Stackoverflow
Solution 22 - ExcelJulianView Answer on Stackoverflow
Solution 23 - ExcelJason WilliamsView Answer on Stackoverflow
Solution 24 - ExcelSequenceDigitale.comView Answer on Stackoverflow
Solution 25 - ExcelLucasView Answer on Stackoverflow
Solution 26 - ExcelvladamanView Answer on Stackoverflow
Solution 27 - ExcelMarius GriView Answer on Stackoverflow
Solution 28 - ExcelEssam AltantawiView Answer on Stackoverflow
Solution 29 - ExcelElia WeissView Answer on Stackoverflow
Solution 30 - Excelacademic.userView Answer on Stackoverflow
Solution 31 - ExcelKyleView Answer on Stackoverflow
Solution 32 - ExcelMartinView Answer on Stackoverflow
Solution 33 - ExcelmanuchapView Answer on Stackoverflow
Solution 34 - Exceleye-wonderView Answer on Stackoverflow
Solution 35 - ExcelkikaDESIGNView Answer on Stackoverflow
Solution 36 - Exceluser1668233View Answer on Stackoverflow