UTF-8 byte[] to String

JavaUtf 8

Java Problem Overview


Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and converting each one?

public String openFileToString(byte[] _bytes)
{
	String file_string = "";
	
	for(int i = 0; i < _bytes.length; i++)
	{
		file_string += (char)_bytes[i];
	}
	
	return file_string;    
}

Java Solutions


Solution 1 - Java

Look at the constructor for String

String str = new String(bytes, StandardCharsets.UTF_8);

And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:

String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

Solution 2 - Java

Java String class has a built-in-constructor for converting byte array to string.

byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};

String value = new String(byteArray, "UTF-8");

Solution 3 - Java

To convert utf-8 data, you can't assume a 1-1 correspondence between bytes and characters. Try this:

String file_string = new String(bytes, "UTF-8");

(Bah. I see I'm way to slow in hitting the Post Your Answer button.)

To read an entire file as a String, do something like this:

public String openFileToString(String fileName) throws IOException
{
    InputStream is = new BufferedInputStream(new FileInputStream(fileName));

    try {
        InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
        StringBuilder contents = new StringBuilder();
        char[] buff = new char[4096];
        int len = rdr.read(buff);
        while (len >= 0) {
            contents.append(buff, 0, len);
        }
        return buff.toString();
    } finally {
        try {
            is.close();
        } catch (Exception e) {
            // log error in closing the file
        }
    }
}

Solution 4 - Java

You can use the String(byte[] bytes) constructor for that. See this link for details. EDIT You also have to consider your plateform's default charset as per the java doc: > Constructs a new String by decoding the specified array of bytes using > the platform's default charset. The length of the new String is a > function of the charset, and hence may not be equal to the length of > the byte array. The behavior of this constructor when the given bytes > are not valid in the default charset is unspecified. The > CharsetDecoder class should be used when more control over the > decoding process is required.

Solution 5 - Java

Knowing that you are dealing with a UTF-8 byte array, you'll definitely want to use the String constructor that accepts a charset name. Otherwise you may leave yourself open to some charset encoding based security vulnerabilities. Note that it throws UnsupportedEncodingException which you'll have to handle. Something like this:

public String openFileToString(String fileName) {
    String file_string;
    try {
        file_string = new String(_bytes, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // this should never happen because "UTF-8" is hard-coded.
        throw new IllegalStateException(e);
    }
    return file_string;
}

Solution 6 - Java

You could use the methods described in this question (especially since you start off with an InputStream): https://stackoverflow.com/q/309424/372643

In particular, if you don't want to rely on external libraries, you can try this answer, which reads the InputStream via an InputStreamReader into a char[] buffer and appends it into a StringBuilder.

Solution 7 - Java

Here's a simplified function that will read in bytes and create a string. It assumes you probably already know what encoding the file is in (and otherwise defaults).

static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";

public static String readFileToString(String filePath, String encoding) throws IOException {

    if (encoding == null || encoding.length() == 0)
        encoding = DEFAULT_ENCODING;
    
    StringBuffer content = new StringBuffer();
    
    FileInputStream fis = new FileInputStream(new File(filePath));
    byte[] buffer = new byte[BUFF_SIZE];

    int bytesRead = 0;
    while ((bytesRead = fis.read(buffer)) != -1)
        content.append(new String(buffer, 0, bytesRead, encoding));
    
    fis.close();        
    return content.toString();
}

Solution 8 - Java

String has a constructor that takes byte[] and charsetname as parameters :)

Solution 9 - Java

This also involves iterating, but this is much better than concatenating strings as they are very very costly.

public String openFileToString(String fileName)
{
    StringBuilder s = new StringBuilder(_bytes.length);

    for(int i = 0; i < _bytes.length; i++)
    {
        s.append((char)_bytes[i]);
    }

    return s.toString();    
}

Solution 10 - Java

Why not get what you are looking for from the get go and read a string from the file instead of an array of bytes? Something like:

BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));

then readLine from in until it's done.

Solution 11 - Java

I use this way

String strIn = new String(_bytes, 0, numBytes);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionskerylView Question on Stackoverflow
Solution 1 - JavaJason NicholsView Answer on Stackoverflow
Solution 2 - JavaKashif KhanView Answer on Stackoverflow
Solution 3 - JavaTed HoppView Answer on Stackoverflow
Solution 4 - JavaGETahView Answer on Stackoverflow
Solution 5 - JavaAsaphView Answer on Stackoverflow
Solution 6 - JavaBrunoView Answer on Stackoverflow
Solution 7 - JavascotttView Answer on Stackoverflow
Solution 8 - JavasoulcheckView Answer on Stackoverflow
Solution 9 - JavabragboyView Answer on Stackoverflow
Solution 10 - JavadigitaljoelView Answer on Stackoverflow
Solution 11 - JavaAnatoliy PelepetzView Answer on Stackoverflow