Problems converting byte array to string and back to byte array

JavaStringEncryptionBytearray

Java Problem Overview


There are a lot of questions with this topic, the same solution, but this doesn't work for me. I have a simple test with an encryption. The encryption/decryption itself works (as long as I handle this test with the byte array itself and not as Strings). The problem is that don't want to handle it as byte array but as String, but when I encode the byte array to string and back, the resulting byte array differs from the original byte array, so the decryption doesn't work anymore. I tried the following parameters in the corresponding string methods: UTF-8, UTF8, UTF-16, UTF8. None of them work. The resulting byte array differs from the original. Any ideas why this is so?

Encrypter:

public class NewEncrypter
{
    private String algorithm = "DESede";
    private Key key = null;
    private Cipher cipher = null;
    
    public NewEncrypter() throws NoSuchAlgorithmException, NoSuchPaddingException
    {
    	 key = KeyGenerator.getInstance(algorithm).generateKey();
         cipher = Cipher.getInstance(algorithm);
    }
    
	public byte[] encrypt(String input) throws Exception
	{
		cipher.init(Cipher.ENCRYPT_MODE, key);
		byte[] inputBytes = input.getBytes("UTF-16");
		
		return cipher.doFinal(inputBytes);
	}

	public String decrypt(byte[] encryptionBytes) throws Exception
	{
		cipher.init(Cipher.DECRYPT_MODE, key);
		byte[] recoveredBytes = cipher.doFinal(encryptionBytes);
		String recovered = new String(recoveredBytes, "UTF-16");
		
		return recovered;
	}
}

This is the test where I try it:

public class NewEncrypterTest
{
	@Test
	public void canEncryptAndDecrypt() throws Exception
	{
		String toEncrypt = "FOOBAR";
		
		NewEncrypter encrypter = new NewEncrypter();
		
		byte[] encryptedByteArray = encrypter.encrypt(toEncrypt);
		System.out.println("encryptedByteArray:" + encryptedByteArray);
	
		String decoded = new String(encryptedByteArray, "UTF-16");
		System.out.println("decoded:" + decoded);
		
		byte[] encoded = decoded.getBytes("UTF-16");
		System.out.println("encoded:" + encoded);
		
		String decryptedText = encrypter.decrypt(encoded); //Exception here
		System.out.println("decryptedText:" + decryptedText);
		
		assertEquals(toEncrypt, decryptedText);
	}
}

Java Solutions


Solution 1 - Java

It is not a good idea to store encrypted data in Strings because they are for human-readable text, not for arbitrary binary data. For binary data it's best to use byte[].

However, if you must do it you should use an encoding that has a 1-to-1 mapping between bytes and characters, that is, where every byte sequence can be mapped to a unique sequence of characters, and back. One such encoding is ISO-8859-1, that is:

    String decoded = new String(encryptedByteArray, "ISO-8859-1");
    System.out.println("decoded:" + decoded);

    byte[] encoded = decoded.getBytes("ISO-8859-1"); 
    System.out.println("encoded:" + java.util.Arrays.toString(encoded));

    String decryptedText = encrypter.decrypt(encoded);

Other common encodings that don't lose data are hexadecimal and base64, but sadly you need a helper library for them. The standard API doesn't define classes for them.

With UTF-16 the program would fail for two reasons:

  1. String.getBytes("UTF-16") adds a byte-order-marker character to the output to identify the order of the bytes. You should use UTF-16LE or UTF-16BE for this to not happen.
  2. Not all sequences of bytes can be mapped to characters in UTF-16. First, text encoded in UTF-16 must have an even number of bytes. Second, UTF-16 has a mechanism for encoding unicode characters beyond U+FFFF. This means that e.g. there are sequences of 4 bytes that map to only one unicode character. For this to be possible the first 2 bytes of the 4 don't encode any character in UTF-16.

Solution 2 - Java

Accepted solution will not work if your String has some non-typical charcaters such as š, ž, ć, Ō, ō, Ū, etc.

Following code worked nicely for me.

byte[] myBytes = Something.getMyBytes();
String encodedString = Base64.encodeToString(bytes, Base64.NO_WRAP);
byte[] decodedBytes = Base64.decode(encodedString, Base64.NO_WRAP);

Solution 3 - Java

Now, I found another solution too...

    public class NewEncrypterTest
    {
    	@Test
    	public void canEncryptAndDecrypt() throws Exception
    	{
    		String toEncrypt = "FOOBAR";
    
    		NewEncrypter encrypter = new NewEncrypter();
    		
    		byte[] encryptedByteArray = encrypter.encrypt(toEncrypt);
    		String encoded = String.valueOf(Hex.encodeHex(encryptedByteArray));
    		
    		byte[] byteArrayToDecrypt = Hex.decodeHex(encoded.toCharArray());
    		String decryptedText = encrypter.decrypt(byteArrayToDecrypt); 
    		
    		System.out.println("decryptedText:" + decryptedText);
    		
    		assertEquals(toEncrypt, decryptedText);
    	}
    }

Solution 4 - Java

Your problem is that you cannot build a UTF-16 (or any other encoding) String from an arbitrary byte array (see UTF-16 on Wikipedia). It is up to you, however, to serialize and deserialize the encrypted byte array without any loss, in order to, say, persist it, and make use of it later. Here's the modified client code that should give you some insight of what's actually happening with the byte arrays:

public static void main(String[] args) throws Exception {
  String toEncrypt = "FOOBAR";

  NewEncrypter encrypter = new NewEncrypter();

  byte[] encryptedByteArray = encrypter.encrypt(toEncrypt);
  System.out.println("encryptedByteArray:" + Arrays.toString(encryptedByteArray));

  String decoded = new String(encryptedByteArray, "UTF-16");
  System.out.println("decoded:" + decoded);

  byte[] encoded = decoded.getBytes("UTF-16");
  System.out.println("encoded:" + Arrays.toString(encoded));

  String decryptedText = encrypter.decrypt(encryptedByteArray); // NOT the "encoded" value!
  System.out.println("decryptedText:" + decryptedText);
}

This is the output:

encryptedByteArray:[90, -40, -39, -56, -90, 51, 96, 95, -65, -54, -61, 51, 6, 15, -114, 88]
decoded:<some garbage>
encoded:[-2, -1, 90, -40, -1, -3, 96, 95, -65, -54, -61, 51, 6, 15, -114, 88]
decryptedText:FOOBAR

The decryptedText is correct, when restored from the original encryptedByteArray. Please note that the encoded value is not the same as encryptedByteArray, due to the data loss during the byte[] -> String("UTF-16")->byte[] conversion.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBevorView Question on Stackoverflow
Solution 1 - JavaJoniView Answer on Stackoverflow
Solution 2 - JavaAleksandar IlicView Answer on Stackoverflow
Solution 3 - JavaBevorView Answer on Stackoverflow
Solution 4 - JavaAlexander PavlovView Answer on Stackoverflow