invalid byte sequence for encoding "UTF8"

PostgresqlImport

Postgresql Problem Overview


I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?

Postgresql Solutions


Solution 1 - Postgresql

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

Solution 2 - Postgresql

psql=# copy tmp from '/path/to/file.csv' with delimiter ',' csv header encoding 'windows-1251';

Adding encoding option worked in my case.

Solution 3 - Postgresql

If you are ok with discarding nonconvertible characters, you can use -c flag

iconv -c -t utf8 filename.csv > filename.utf8.csv

and then copy them to your table

Solution 4 - Postgresql

Apparently I can just set the encoding on the fly,

 set client_encoding to 'latin1'

And then re-run the query. Not sure what encoding I should be using though.


latin1 made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.

Solution 5 - Postgresql

This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:

iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt

iconv: illegal input sequence at position (some number)

The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:

vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"

To find non ASCII characters you may use the following command:

grep --color='auto' -P "[\x80-\xFF]"

If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.

Solution 6 - Postgresql

follow the below steps to solve this issue in pgadmin:

  1. SET client_encoding = 'ISO_8859_5';

  2. COPY tablename(column names) FROM 'D:/DB_BAK/csvfilename.csv' WITH DELIMITER ',' CSV ;

Solution 7 - Postgresql

It depends on what type of machine/encoding generated your import file.

If you're getting it from an English or Western European version of Windows, your best bet is probably setting it to 'WIN1252'. If you are getting it from a different source, consult the list of character encodings here:

http://www.postgresql.org/docs/8.3/static/multibyte.html

If you're getting it from a Mac, you may have to run it through the "iconv" utility first to convert it from MacRoman to UTF-8.

Solution 8 - Postgresql

I had the same problem: my file was not encoded as UTF-8. I have solved it by opening the file with notepad++ and changing the encoding of the file.

Go to "Encoding" and select "Convert to UTF-8". Save changes and that's all!

Solution 9 - Postgresql

Well I was facing the same problem. And what solved my problem is this:

In excel click on Save as. From save as type, choose .csv Click on Tools. Then choose web options from drop down list. Under Encoding tab, save the document as Unicode(UTF-8). Click OK. Save the file. DONE !

Solution 10 - Postgresql

I had the same problem, and found a nice solution here: http://blog.e-shell.org/134

> This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8. .. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file.

So I just recoded the dumpfile before playing it back:

postgres> gunzip -c /var/backups/pgall_b1.zip | recode iso-8859-1..u8 | psql test

In Debian or Ubuntu systems, recode can be installed via package.

Solution 11 - Postgresql

copy tablename from 'filepath\filename' DELIMITERS '=' ENCODING 'WIN1252';

you can try this to handle UTF8 encoding.

Solution 12 - Postgresql

Short Example to Solve this Problem in PHP-

$val = "E'\377'";
iconv(mb_detect_encoding($val, mb_detect_order(), true), "UTF-8", $val);

Error Detail: As POSTGRES database do not handle other than UTF-8 Characters when we try to pass above given inputs to a column its give error of "invalid byte sequence for encoding "UTF8": 0xab" .

So just convert that value into UTF-8 before insertion in POSTGRES Database.

Solution 13 - Postgresql

I ran into this problem under Windows while working exclusively with psql (no graphical tools). To fix this problem, permanently change the default encoding of psql (client) to match the default encoding of the PostgreSQL server. Run the following command in CMD or Powershell:

setx PGCLIENTENCODING UTF8

Close and reopen you command prompt/Powershell for the change to take effect.

Change the encoding of the backup file from Unicode to UTF8 by opening it with Notepad and going to File -> Save As. Change the Encoding dropdown from Unicode to UTF8. (Also change the Save as type from Text Documents (.txt) to All Files in order to avoid adding the .txt extension to your backup file's name). You should now be able to restore your backup.

Solution 14 - Postgresql

Open your csv file in excel, and save it in utf8-csv format

Solution 15 - Postgresql

This error may occur if input data contain escape character itself. By default escape character is "" symbol, so if your input text contain "" character - try to change the default value using ESCAPE option.

Solution 16 - Postgresql

You can replace the backslash character with, for example a pipe character, with sed.

sed -i -- 's/\\/|/g' filename.txt

Solution 17 - Postgresql

For python, you need to use

Class pg8000.types.Bytea (str) Bytea is a str-derived class that is mapped to a PostgreSQL byte array.

or

Pg8000.Binary (value) Construct an object holding binary data.

Solution 18 - Postgresql

Open file CSV by Notepad++ . Choose menu Encoding \ Encoding in UTF-8, then fix few cell manuallly.

Then try import again.

Solution 19 - Postgresql

Alternative cause on Windows with pgadmin v4.4:

Column names with non-ASCII characters will somehow mess up the psql import command, and give you this unintuitive error message. Your UTF8 csv data is probably fine.

Solution:

Rename your fields.

Example:

"Résultat" -> resultat

Solution 20 - Postgresql

It is also very possible with this error that the field is encrypted in place. Be sure you are looking at the right table, in some cases administrators will create an unencrypted view that you can use instead. I recently encountered a very similar issue.

Solution 21 - Postgresql

I got the same error when I was trying to copy a csv generated by Excel to a Postgres table (all on a Mac). This is how I resolved it:

  1. Open the File in Atom (the IDE that I use)

  2. Make an insignificant change in the file. Save the file. Undo the change. Save again.

Presto! Copy command worked now.

(I think Atom saved it in a format which worked)

Solution 22 - Postgresql

If your CSV is going to be exported from SQL Server, it is huge, and it has Unicode characters, you can export it by setting the encoding as UTF-8:

Right-Click DB > Tasks > Export > 'SQL Server Native Client 11.0' >> 'Flat File Destination > File name: ... > Code page: UTF-8 >> ...

In the next page it asks whether you want to copy data from a table or you want to write a query. If you have char or varchar data types in your table, select the query option and cast those columns as nvarchar(max). E.g if myTable has two columns where the first one is varchar and second one int, I cast the first one to nvarchar:

select cast (col1 as nvarchar(max)) col1
       , col2
from myTable

Solution 23 - Postgresql

some of lolutions may be very sambles

i there any spaces in the name of comlun will be cause this problem

review every columns name for exaple "colum_name "#>>rong "colum_nam" #>>right

Solution 24 - Postgresql

better to identify problem lines with command:

grep -naxv '.*' source_data.txt

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmpenView Question on Stackoverflow
Solution 1 - PostgresqlMike Sherrill 'Cat Recall'View Answer on Stackoverflow
Solution 2 - PostgresqlNobuView Answer on Stackoverflow
Solution 3 - PostgresqlAbdellah AlaouiView Answer on Stackoverflow
Solution 4 - PostgresqlmpenView Answer on Stackoverflow
Solution 5 - PostgresqlYuri LevinskyView Answer on Stackoverflow
Solution 6 - PostgresqlRamesh RView Answer on Stackoverflow
Solution 7 - PostgresqlBobGView Answer on Stackoverflow
Solution 8 - PostgresqlFrancisco Javier Snchez SabidoView Answer on Stackoverflow
Solution 9 - PostgresqlVishal ChhatwaniView Answer on Stackoverflow
Solution 10 - PostgresqlEd DoerrView Answer on Stackoverflow
Solution 11 - PostgresqlRishi jhaView Answer on Stackoverflow
Solution 12 - PostgresqlNneha SachanView Answer on Stackoverflow
Solution 13 - PostgresqlHeheView Answer on Stackoverflow
Solution 14 - PostgresqlMohit goyalView Answer on Stackoverflow
Solution 15 - PostgresqljaascoView Answer on Stackoverflow
Solution 16 - PostgresqlRichard GreenwoodView Answer on Stackoverflow
Solution 17 - PostgresqlvrnView Answer on Stackoverflow
Solution 18 - PostgresqlJames GrahamView Answer on Stackoverflow
Solution 19 - PostgresqlBenoitParisView Answer on Stackoverflow
Solution 20 - PostgresqlJosh BartonView Answer on Stackoverflow
Solution 21 - PostgresqlAnupamView Answer on Stackoverflow
Solution 22 - PostgresqlLoMaPhView Answer on Stackoverflow
Solution 23 - PostgresqlYoussri Abo ElseodView Answer on Stackoverflow
Solution 24 - PostgresqlmartinnovotyView Answer on Stackoverflow