Reading InputStream as UTF-8

JavaUtf 8Inputstream

Java Problem Overview


I'm trying to read from a text/plain file over the internet, line-by-line. The code I have right now is:

URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;

while ((readLine = in.readLine()) != null) {
    lines.add(readLine);
}

for (String line : lines) {
    out.println("> " + line);
}

The file, test.txt, contains ¡Hélló!, which I am using in order to test the encoding.

When I review the OutputStream (out), I see it as > ¬°H√©ll√≥!. I don't believe this is a problem with the OutputStream since I can do out.println("é"); without problems.

Any ideas for reading form the InputStream as UTF-8? Thanks!

Java Solutions


Solution 1 - Java

Solved my own problem. This line:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

needs to be:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

or since Java 7:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));

Solution 2 - Java

String file = "";

try {

	InputStream is = new FileInputStream(filename);
	String UTF8 = "utf8";
	int BUFFER_SIZE = 8192;

	BufferedReader br = new BufferedReader(new InputStreamReader(is,
			UTF8), BUFFER_SIZE);
	String str;
	while ((str = br.readLine()) != null) {
		file += str;
	}
} catch (Exception e) {

}

Try this,.. :-)

Solution 3 - Java

I ran into the same problem every time it finds a special character marks it as ��. to solve this, I tried using the encoding: ISO-8859-1

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));

while ((line = br.readLine()) != null) {
               
}

I hope this can help anyone who sees this post.

Solution 4 - Java

If you use the constructor InputStreamReader(InputStream in, Charset cs), bad characters are silently replaced. To change this behaviour, use a CharsetDecoder :

public static Reader newReader(Inputstream is) {
  new InputStreamReader(is,
      StandardCharsets.UTF_8.newDecoder()
      .onMalformedInput(CodingErrorAction.REPORT)
      .onUnmappableCharacter(CodingErrorAction.REPORT)
  );
}

Then catch java.nio.charset.CharacterCodingException.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChris KuehlView Question on Stackoverflow
Solution 1 - JavaChris KuehlView Answer on Stackoverflow
Solution 2 - JavaRohithView Answer on Stackoverflow
Solution 3 - JavaJoshua Joel ClevelandView Answer on Stackoverflow
Solution 4 - JavagrigouilleView Answer on Stackoverflow