Python reading from a file and saving to utf-8

PythonPython 2.7Utf 8

Python Problem Overview


I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

Here is the code:

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

I then do some processing on the variable text.

And then

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 
    
#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

Python Solutions


Solution 1 - Python

Process text to and from Unicode at the I/O boundaries of your program using open with the encoding parameter. Make sure to use the (hopefully documented) encoding of the file being read. The default encoding varies by OS (specifically, locale.getpreferredencoding(False) is the encoding used), so I recommend always explicitly using the encoding parameter for portability and clarity (Python 3 syntax below):

with open(filename, 'r', encoding='utf8') as f:
    text = f.read()

# process Unicode text

with open(filename, 'w', encoding='utf8') as f:
    f.write(text)

If still using Python 2 or for Python 2/3 compatibility, the io module implements open with the same semantics as Python 3's open and exists in both versions:

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()

# process Unicode text

with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

Solution 2 - Python

You can also get through it by the code below:

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()

Solution 3 - Python

You can't do that using open. use codecs.

when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:

import codecs
file = codecs.open('data.txt','w','utf-8')

Solution 4 - Python

The encoding parameter is what does the trick.

my_list = ['1', '2', '3', '4']
with open('test.txt', 'w', encoding='utf8') as file:
    for i in my_list:
        file.write(i + '\n')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionaarelovichView Question on Stackoverflow
Solution 1 - PythonMark TolonenView Answer on Stackoverflow
Solution 2 - PythonSiva KumarView Answer on Stackoverflow
Solution 3 - PythonFernando Freitas AlvesView Answer on Stackoverflow
Solution 4 - PythonJuan Carlos OrantesView Answer on Stackoverflow