Write a file in UTF-8 using FileWriter (Java)?

JavaFile IoUnicodeUtf 8File Format

Java Problem Overview


I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter?

I would really appreciate your help with this. Thanks.

try {
  BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));
  writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));
  while( (line = reader.readLine()) != null) {
    //If the line starts with a tab then we just want to add a movie
    //using the current actor's name.
    if(line.length() == 0)
      continue;
    else if(line.charAt(0) == '\t') {
      readMovieLine2(0, line, surname.toString(), forename.toString());
    } //Else we've reached a new actor
    else {
      readActorName(line);
    }
  }
} catch (IOException e) {
  e.printStackTrace();
}

Java Solutions


Solution 1 - Java

Safe Encoding Constructors

Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader and OutputStreamWriter to receive a proper exception on an encoding glitch.

For file I/O, always make sure to always use as the second argument to both OutputStreamWriter and InputStreamReader the fancy encoder argument:

  Charset.forName("UTF-8").newEncoder()

There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:

 OutputStreamWriter char_output = new OutputStreamWriter(
     new FileOutputStream("some_output.utf8"),
     Charset.forName("UTF-8").newEncoder() 
 );

 InputStreamReader char_input = new InputStreamReader(
     new FileInputStream("some_input.utf8"),
     Charset.forName("UTF-8").newDecoder() 
 );

As for running with

 $ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere

The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.

Longer Example

Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:

 // this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
 Process
 slave_process = Runtime.getRuntime().exec("perl -CS script args");

 // fetch his stdin byte stream...
 OutputStream
 __bytes_into_his_stdin  = slave_process.getOutputStream();

 // and make a character stream with exceptions on encoding errors
 OutputStreamWriter
   chars_into_his_stdin  = new OutputStreamWriter(
                             __bytes_into_his_stdin,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newEncoder()
                         );

 // fetch his stdout byte stream...
 InputStream
 __bytes_from_his_stdout = slave_process.getInputStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stdout = new InputStreamReader(
                             __bytes_from_his_stdout,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

// fetch his stderr byte stream...
 InputStream
 __bytes_from_his_stderr = slave_process.getErrorStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stderr = new InputStreamReader(
                             __bytes_from_his_stderr,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin, chars_from_his_stdout, and chars_from_his_stderr.

This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.

Just don’t get me started about PrintStreams eating exceptions.

Solution 2 - Java

Ditch FileWriter and FileReader, which are useless exactly because they do not allow you to specify the encoding. Instead, use

new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)

and

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);

Solution 3 - Java

You need to use the OutputStreamWriter class as the writer parameter for your BufferedWriter. It does accept an encoding. Review javadocs for it.

Somewhat like this:

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream("jedis.txt"), "UTF-8"
));

Or you can set the current system encoding with the system property file.encoding to UTF-8.

java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...

You may also set it as a system property at runtime with System.setProperty(...) if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter.

By setting the system property you can use FileWriter and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.

EDIT

  • Starting from API 19, you can replace the String "UTF-8" with StandardCharsets.UTF_8

  • As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the OutputStreamWriter approach and use the constructor that receives a charset encoder.

    Somewhat like

      CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
      encoder.onMalformedInput(CodingErrorAction.REPORT);
      encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
      BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
    

    You may choose between actions IGNORE | REPLACE | REPORT

Also, this question was already answered here.

Solution 4 - Java

Since Java 11 you can do:

FileWriter fw = new FileWriter("filename.txt", Charset.forName("utf-8"));

Solution 5 - Java

Since Java 7 there is an easy way to handle character encoding of BufferedWriter and BufferedReaders. You can create a BufferedWriter directly by using the Files class instead of creating various instances of Writer. You can simply create a BufferedWriter, which considers character encoding, by calling:

Files.newBufferedWriter(file.toPath(), StandardCharsets.UTF_8);

You can find more about it in JavaDoc:

Solution 6 - Java

With Chinese text, I tried to use the Charset UTF-16 and lucklily it work.

Hope this could help!

PrintWriter out = new PrintWriter( file, "UTF-16" );

Solution 7 - Java

OK it's 2019 now, and from Java 11 you have a constructor with Charset:

FileWriter​(String fileName, Charset charset)

> Unfortunately, we still cannot modify the byte buffer size, and it's > set to 8192. (https://www.baeldung.com/java-filewriter)

Solution 8 - Java

use OutputStream instead of FileWriter to set encoding type

// file is your File object where you want to write you data 
OutputStream outputStream = new FileOutputStream(file);
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
outputStreamWriter.write(json); // json is your data 
outputStreamWriter.flush();
outputStreamWriter.close();

Solution 9 - Java

In my opinion

If you wanna write follow kind UTF-8.You should create a byte array.Then,you can do such as the following: byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();

Then, you can write each byte into file you created. Example:

OutputStream f=new FileOutputStream(xmlfile);
    byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
    for (int i=0;i<by.length;i++){
    byte b=by[i];
    f.write(b);

    }
    f.close();

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1280970View Question on Stackoverflow
Solution 1 - JavatchristView Answer on Stackoverflow
Solution 2 - JavaMichael BorgwardtView Answer on Stackoverflow
Solution 3 - JavaEdwin DalorzoView Answer on Stackoverflow
Solution 4 - JavamortensiView Answer on Stackoverflow
Solution 5 - JavaLars BriemView Answer on Stackoverflow
Solution 6 - JavaPhuongView Answer on Stackoverflow
Solution 7 - Javacode đờView Answer on Stackoverflow
Solution 8 - JavazakariaView Answer on Stackoverflow
Solution 9 - JavaPhan Ngọc Hoàng DươngView Answer on Stackoverflow