How to convert a string to UTF8 in Ruby

RubyFileEncodingUtf 8Dump

Ruby Problem Overview


I'm writing a crawler which uses Hpricot. It downloads a list of strings from some webpage, then I try to write it to the file. Something is wrong with the encoding:

"\xC3" from ASCII-8BIT to UTF-8

I have items which are rendered on a webpage and printed this way:

Développement

the str.encoding returns UTF-8, so force_encoding('UTF-8') doesn't help. How may I convert this to readable UTF-8?

Ruby Solutions


Solution 1 - Ruby

Your string seems to have been encoded the wrong way round:

"Développement".encode("iso-8859-1").force_encoding("utf-8")
#=> "Développement"

Solution 2 - Ruby

Seems your string thinks it is UTF-8, but in reality, it is something else, probably ISO-8859-1.

Define (force) the correct encoding first, then convert it to UTF-8.

In your example:

puts "Développement".encode('iso-8859-1').encode('utf-8')

An alternative is:

puts "\xC3".force_encoding('iso-8859-1').encode('utf-8') #-> Ã

If the à makes no sense, then try another encoding.

Solution 3 - Ruby

"https://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8?rq=1" described another good approach with less code:

file_contents.encode!('UTF-16', 'UTF-8')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionciemborView Question on Stackoverflow
Solution 1 - RubyStefanView Answer on Stackoverflow
Solution 2 - RubyknutView Answer on Stackoverflow
Solution 3 - Rubykaleb4egView Answer on Stackoverflow