How to convert UTF-8 byte[] to string

C#.NetArraysStringType Conversion

C# Problem Overview


I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.

In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?

Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.

C# Solutions


Solution 1 - C#

string result = System.Text.Encoding.UTF8.GetString(byteArray);

Solution 2 - C#

There're at least four different ways doing this conversion.

  1. Encoding's GetString
    , but you won't be able to get the original bytes back if those bytes have non-ASCII characters.

  2. BitConverter.ToString
    The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.

  3. Convert.ToBase64String
    You can easily convert the output string back to byte array by using Convert.FromBase64String.
    Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.

  4. HttpServerUtility.UrlTokenEncode
    You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.

A full example:

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

Solution 3 - C#

A general solution to convert from byte array to string when you don't know the encoding:

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

Solution 4 - C#

Definition:

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

Using:

string result = input.ConvertByteToString();

Solution 5 - C#

Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:

private string ToString(byte[] bytes)
{
    string response = string.Empty;

    foreach (byte b in bytes)
        response += (Char)b;

    return response;
}

Solution 6 - C#

Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67

public static class Ext {

	public static string ToHexString(this byte[] hex)
	{
		if (hex == null) return null;
		if (hex.Length == 0) return string.Empty;

		var s = new StringBuilder();
		foreach (byte b in hex) {
			s.Append(b.ToString("x2"));
		}
		return s.ToString();
	}

	public static byte[] ToHexBytes(this string hex)
	{
		if (hex == null) return null;
		if (hex.Length == 0) return new byte[0];

		int l = hex.Length / 2;
		var b = new byte[l];
		for (int i = 0; i < l; ++i) {
			b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
		}
		return b;
	}

	public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
	{
		if (bytes == null && bytesToCompare == null) return true; // ?
		if (bytes == null || bytesToCompare == null) return false;
		if (object.ReferenceEquals(bytes, bytesToCompare)) return true;

		if (bytes.Length != bytesToCompare.Length) return false;

		for (int i = 0; i < bytes.Length; ++i) {
			if (bytes[i] != bytesToCompare[i]) return false;
		}
		return true;
	}

}

Solution 7 - C#

I saw some answers at this post and it's possible to be considered completed base knowledge, because I have a several approaches in C# Programming to resolve the same problem. The only thing that is necessary to be considered is about a difference between pure UTF-8 and UTF-8 with a BOM.

Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.

Finally, my C# Programming for both UTF-8 encoding types (with BOM and pure) needed to be similar to this example below:

// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);

//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);

Solution 8 - C#

There is also class UnicodeEncoding, quite simple in usage:

ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);

Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));

Solution 9 - C#

In addition to the selected answer, if you're using .NET 3.5 or .NET 3.5 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:

string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);

Solution 10 - C#

Alternatively:

 var byteStr = Convert.ToBase64String(bytes);

Solution 11 - C#

The BitConverter class can be used to convert a byte[] to string.

var convertedString = BitConverter.ToString(byteAttay);

Documentation of BitConverter class can be fount on MSDN.

Solution 12 - C#

A LINQ one-liner for converting a byte array byteArrFilename read from a file to a pure ASCII C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.

String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
                              .Select(x => x < 128 ? (Char)x : '?').ToArray());

I use '?' as the default character for anything not pure ASCII here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.

Solution 13 - C#

To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:

// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
    int strlen = 0;
    while
    (
        (startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
        && buffer[startIndex + strlen] != 0       // The typical null terimation check
    )
    {
        ++strlen;
    }
    return strlen;
}

// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
    strlen = StringLength(buffer, startIndex);
    byte[] c_str = new byte[strlen];
    Array.Copy(buffer, startIndex, c_str, 0, strlen);
    return Encoding.UTF8.GetString(c_str);
}

The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case

Solution 14 - C#

Try this console application:

static void Main(string[] args)
{
    //Encoding _UTF8 = Encoding.UTF8;
    string[] _mainString = { "Hello, World!" };
    Console.WriteLine("Main String: " + _mainString);

    // Convert a string to UTF-8 bytes.
    byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);

    // Convert UTF-8 bytes to a string.
    string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
    Console.WriteLine("String Unicode: " + _stringuUnicode);
}

Solution 15 - C#

Here is a result where you didn’t have to bother with encoding. I used it in my network class and send binary objects as string with it.

public static byte[] String2ByteArray(string str)
{
    char[] chars = str.ToArray();
    byte[] bytes = new byte[chars.Length * 2];

    for (int i = 0; i < chars.Length; i++)
        Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);

    return bytes;
}

public static string ByteArray2String(byte[] bytes)
{
    char[] chars = new char[bytes.Length / 2];

    for (int i = 0; i < chars.Length; i++)
        chars[i] = BitConverter.ToChar(bytes, i * 2);

    return new string(chars);
}

Solution 16 - C#

string result = ASCIIEncoding.UTF8.GetString(byteArray);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBCSView Question on Stackoverflow
Solution 1 - C#ZanoniView Answer on Stackoverflow
Solution 2 - C#detaleView Answer on Stackoverflow
Solution 3 - C#NirView Answer on Stackoverflow
Solution 4 - C#Erçin DedeoğluView Answer on Stackoverflow
Solution 5 - C#AndrewJEView Answer on Stackoverflow
Solution 6 - C#metadingsView Answer on Stackoverflow
Solution 7 - C#Antonio LeonardoView Answer on Stackoverflow
Solution 8 - C#P.K.View Answer on Stackoverflow
Solution 9 - C#The OneView Answer on Stackoverflow
Solution 10 - C#FehrView Answer on Stackoverflow
Solution 11 - C#SagarView Answer on Stackoverflow
Solution 12 - C#NyergudsView Answer on Stackoverflow
Solution 13 - C#AssimilaterView Answer on Stackoverflow
Solution 14 - C#R M Shahidul Islam ShahedView Answer on Stackoverflow
Solution 15 - C#Marco PardoView Answer on Stackoverflow
Solution 16 - C#S.ATTA.MView Answer on Stackoverflow