How can I transform string to UTF-8 in C#?

C#StringEncodingUtf 8Character Encoding

C# Problem Overview


I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.

Due to incorrect encoding, a piece of my string looks like this in Spanish: >Acción

whereas it should look like this: >Acción

According to the answer on this question: https://stackoverflow.com/questions/13993135/how-to-know-string-encoding-in-c-sharp/13994368#13994368, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).

I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.

I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:

var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);     

I also tried extracting the string into a byte array and then using UTF8.GetString:

byte[] myByteArray = new byte[myString.Length];
for (int ix = 0; ix < myString.Length; ++ix)
{
    char ch = myString[ix];
    myByteArray[ix] = (byte) ch;
}

myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);

Do you guys have any other ideas that I could try?

C# Solutions


Solution 1 - C#

As you know the string is coming in as Encoding.Default you could simply use:

byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);

Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;!!! Or all utf8 strings will be outputed as gbk...

Solution 2 - C#

string utf8String = "Acción";
string propEncodeString = string.Empty;

byte[] utf8_Bytes = new byte[utf8String.Length];
for (int i = 0; i < utf8String.Length; ++i)
{
   utf8_Bytes[i] = (byte)utf8String[i];
}

propEncodeString = Encoding.UTF8.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
    

Output should look like
>Acción

>day’s displays >day's

call DecodeFromUtf8();

private static void DecodeFromUtf8()
{
    string utf8_String = "day’s";
    byte[] bytes = Encoding.Default.GetBytes(utf8_String);
    utf8_String = Encoding.UTF8.GetString(bytes);
}

Solution 3 - C#

Your code is reading a sequence of UTF8-encoded bytes, and decoding them using an 8-bit encoding.

You need to fix that code to decode the bytes as UTF8.

Alternatively (not ideal), you could convert the bad string back to the original byte array—by encoding it using the incorrect encoding—then re-decode the bytes as UTF8.

Solution 4 - C#

 Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(mystring));

Solution 5 - C#

@anothershrubery answer worked for me. I've made an enhancement using StringEntensions Class so I can easily convert any string at all in my program.

Method:

public static class StringExtensions
{
    public static string ToUTF8(this string text)
    {
        return Encoding.UTF8.GetString(Encoding.Default.GetBytes(text));
    }
}

Usage:

string myString = "Acción";
string strConverted = myString.ToUTF8();

Or simply:

string strConverted = "Acción".ToUTF8();

Solution 6 - C#

If you want to save any string to mysql database do this:->

Your database field structure i phpmyadmin [ or any other control panel] should set to utf8-gerneral-ci

  1. you should change your string [Ex. textbox1.text] to byte, therefor

2-1) define byte[] st2;

2-2) convert your string [textbox1.text] to unicode [ mmultibyte string] by :

byte[] st2 = System.Text.Encoding.UTF8.GetBytes(textBox1.Text);

3) execute this sql command before any query:

string mysql_query2 = "SET NAMES 'utf8'";
cmd.CommandText = mysql_query2;
cmd.ExecuteNonQuery();

3-2) now you should insert this value in to for example name field by :

cmd.CommandText = "INSERT INTO customer (`name`) values (@name)";

4) the main job that many solution didn't attention to it is the below line: you should use addwithvalue instead of add in command parameter like below:

cmd.Parameters.AddWithValue("@name",ut);

++++++++++++++++++++++++++++++++++ enjoy real data in your database server instead of ????

Solution 7 - C#

Use the below code snippet to get bytes from csv file

protected byte[] GetCSVFileContent(string fileName)
    {
        StringBuilder sb = new StringBuilder();
        using (StreamReader sr = new StreamReader(fileName, Encoding.Default, true))
        {
            String line;
            // Read and display lines from the file until the end of 
            // the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                sb.AppendLine(line);
            }
        }
        string allines = sb.ToString();


        UTF8Encoding utf8 = new UTF8Encoding();


        var preamble = utf8.GetPreamble();

        var data = utf8.GetBytes(allines);


        return data;
    }

Call the below and save it as an attachment

           Encoding csvEncoding = Encoding.UTF8;
                   //byte[] csvFile = GetCSVFileContent(FileUpload1.PostedFile.FileName);
          byte[] csvFile = GetCSVFileContent("Your_CSV_File_NAme");


        string attachment = String.Format("attachment; filename={0}.csv", "uomEncoded");

        Response.Clear();
        Response.ClearHeaders();
        Response.ClearContent();
        Response.ContentType = "text/csv";
        Response.ContentEncoding = csvEncoding;
        Response.AppendHeader("Content-Disposition", attachment);
        //Response.BinaryWrite(csvEncoding.GetPreamble());
        Response.BinaryWrite(csvFile);
        Response.Flush();
        Response.End();

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGaaraView Question on Stackoverflow
Solution 1 - C#anothershruberyView Answer on Stackoverflow
Solution 2 - C#MethodManView Answer on Stackoverflow
Solution 3 - C#SLaksView Answer on Stackoverflow
Solution 4 - C#Riadh HammoudaView Answer on Stackoverflow
Solution 5 - C#Gustavo Bosch PetiniView Answer on Stackoverflow
Solution 6 - C#Hassan Fadaie GhotbieView Answer on Stackoverflow
Solution 7 - C#jAntoniView Answer on Stackoverflow