Byte and char conversion in Java

JavaEncodingUnicodeUtf 16

Java Problem Overview


If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible?

This is the code:

char a = 'È';       // line 1		
byte b = (byte)a;   // line 2 		
char c = (char)b;   // line 3
System.out.println((char)c + " " + (int)c);

Until line 2 everything is fine:

  • In line 1 I could print "a" in the console and it would show "È".

  • In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.

But what's wrong in line 3? "c" becomes something else and the program prints ? 65480. That's something completely different.

What I should write in line 3 in order to get the correct result?

Java Solutions


Solution 1 - Java

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 - 56 or 65536 - 56.

Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char, which translates to the positive number 65480.

From the language specification:

5.1.4. Widening and Narrowing Primitive Conversion

> First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).


To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.


Above is a direct explanation of what happens during conversion between the byte, int and char primitive types.

If you want to encode/decode characters from bytes, use Charset, CharsetEncoder, CharsetDecoder or one of the convenience methods such as new String(byte[] bytes, Charset charset) or String#toBytes(Charset charset). You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets.

Solution 2 - Java

This worked for me: //Add import statement

import java.nio.charset.Charset;

// Change

sun.io.ByteToCharConverter.getDefault().getCharacterEncoding() -> Charset.defaultCharset()

Solution 3 - Java

new String(byteArray, Charset.defaultCharset())

This will convert a byte array to the default charset in java. It may throw exceptions depending on what you supply with the byteArray.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1883212View Question on Stackoverflow
Solution 1 - JavaMaarten BodewesView Answer on Stackoverflow
Solution 2 - JavaVivek KumarView Answer on Stackoverflow
Solution 3 - JavaJoeView Answer on Stackoverflow