How to get UTF-8 working in Java webapps?
JavaMysqlTomcatEncodingUtf 8Java Problem Overview
I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå
etc. for regular Finnish text and Cyrillic alphabets like ЦжФ
for special cases.
My setup is the following:
- Development environment: Windows XP
- Production environment: Debian
Database used: MySQL 5.x
Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site.
How to achieve this?
Java Solutions
Solution 1 - Java
Answering myself as the FAQ of this site encourages it. This works for me:
Mostly characters äåö are not a problematic as the default character set used by browsers and tomcat/java for webapps is latin1 ie. ISO-8859-1 which "understands" those characters.
To get UTF-8 working under Java+Tomcat+Linux/Windows+Mysql requires the following:
It's necessary to configure that the connector uses UTF-8 to encode url (GET request) parameters: Configuring Tomcat's server.xml
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
compression="on"
compressionMinSize="128"
noCompressionUserAgents="gozilla, traviata"
compressableMimeType="text/html,text/xml,text/plain,text/css,text/ javascript,application/x-javascript,application/javascript"
URIEncoding="UTF-8"
/>
The key part being URIEncoding="UTF-8" in the above example. This quarantees that Tomcat handles all incoming GET parameters as UTF-8 encoded. As a result, when the user writes the following to the address bar of the browser:
https://localhost:8443/ID/Users?action=search&name=*ж*
the character ж is handled as UTF-8 and is encoded to (usually by the browser before even getting to the server) as %D0%B6.
POST request are not affected by this.
Then it's time to force the java webapp to handle all requests and responses as UTF-8 encoded. This requires that we define a character set filter like the following: CharsetFilter
package fi.foo.filters;
import javax.servlet.*;
import java.io.IOException;
public class CharsetFilter implements Filter {
private String encoding;
public void init(FilterConfig config) throws ServletException {
encoding = config.getInitParameter("requestEncoding");
if (encoding == null) encoding = "UTF-8";
}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException {
// Respect the client-specified character encoding
// (see HTTP specification section 3.4.1)
if (null == request.getCharacterEncoding()) {
request.setCharacterEncoding(encoding);
}
// Set the default response content type and encoding
response.setContentType("text/html; charset=UTF-8");
response.setCharacterEncoding("UTF-8");
next.doFilter(request, response);
}
public void destroy() {
}
}
This filter makes sure that if the browser hasn't set the encoding used in the request, that it's set to UTF-8.
The other thing done by this filter is to set the default response encoding ie. the encoding in which the returned html/whatever is. The alternative is to set the response encoding etc. in each controller of the application.
This filter has to be added to the web.xml or the deployment descriptor of the webapp:
<!--CharsetFilter start-->
<filter>
<filter-name>CharsetFilter</filter-name>
<filter-class>fi.foo.filters.CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
The instructions for making this filter are found at the http://wiki.apache.org/tomcat/Tomcat/UTF-8">tomcat wiki (http://wiki.apache.org/tomcat/Tomcat/UTF-8)</a>
In your web.xml, add the following: JSP page encoding
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
Alternatively, all JSP-pages of the webapp would need to have the following at the top of them:
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
If some kind of a layout with different JSP-fragments is used, then this is needed in all of them.
JSP page encoding tells the JVM to handle the characters in the JSP page in the correct encoding. Then it's time to tell the browser in which encoding the html page is: HTML-meta tags
This is done with the following at the top of each xhtml page produced by the webapp:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
...
When using a db, it has to be defined that the connection uses UTF-8 encoding. This is done in context.xml or wherever the JDBC connection is defiend as follows: JDBC-connection
<Resource name="jdbc/AppDB"
auth="Container"
type="javax.sql.DataSource"
maxActive="20" maxIdle="10" maxWait="10000"
username="foo"
password="bar"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/ ID_development?useEncoding=true&characterEncoding=UTF-8"
/>
The used database must use UTF-8 encoding. This is achieved by creating the database with the following: MySQL database and tables
CREATE DATABASE `ID_development`
/*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci */;
Then, all of the tables need to be in UTF-8 also:
CREATE TABLE `Users` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(30) collate utf8_swedish_ci default NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci ROW_FORMAT=DYNAMIC;
The key part being CHARSET=utf8.
MySQL serveri has to be configured also. Typically this is done in Windows by modifying my.ini -file and in Linux by configuring my.cnf -file. In those files it should be defined that all clients connected to the server use utf8 as the default character set and that the default charset used by the server is also utf8. MySQL server configuration
[client]
port=3306
default-character-set=utf8
[mysql]
default-character-set=utf8
These also need to have the character set defined. For example: Mysql procedures and functions
DELIMITER $$
DROP FUNCTION IF EXISTS `pathToNode` $$
CREATE FUNCTION `pathToNode` (ryhma_id INT) RETURNS TEXT CHARACTER SET utf8
READS SQL DATA
BEGIN
DECLARE path VARCHAR(255) CHARACTER SET utf8;
SET path = NULL;
...
RETURN path;
END $$
DELIMITER ;
If and when it's defined in tomcat's server.xml that GET request parameters are encoded in UTF-8, the following GET requests are handled properly: GET requests: latin1 and UTF-8
https://localhost:8443/ID/Users?action=search&name=Petteri
https://localhost:8443/ID/Users?action=search&name=ж
Because ASCII-characters are encoded in the same way both with latin1 and UTF-8, the string "Petteri" is handled correctly.
The Cyrillic character ж is not understood at all in latin1. Because Tomcat is instructed to handle request parameters as UTF-8 it encodes that character correctly as %D0%B6.
If and when browsers are instructed to read the pages in UTF-8 encoding (with request headers and html meta-tag), at least Firefox 2/3 and other browsers from this period all encode the character themselves as %D0%B6.
The end result is that all users with name "Petteri" are found and also all users with the name "ж" are found.
HTTP-specification defines that by default URLs are encoded as latin1. This results in firefox2, firefox3 etc. encoding the following But what about äåö?
https://localhost:8443/ID/Users?action=search&name=*Päivi*
in to the encoded version
https://localhost:8443/ID/Users?action=search&name=*P%E4ivi*
In latin1 the character ä is encoded as %E4. Even though the page/request/everything is defined to use UTF-8. The UTF-8 encoded version of ä is %C3%A4
The result of this is that it's quite impossible for the webapp to correly handle the request parameters from GET requests as some characters are encoded in latin1 and others in UTF-8. Notice: POST requests do work as browsers encode all request parameters from forms completely in UTF-8 if the page is defined as being UTF-8
A very big thank you for the writers of the following for giving the answers for my problem: Stuff to read
- http://tagunov.tripod.com/i18n/i18n.html
- http://wiki.apache.org/tomcat/Tomcat/UTF-8
- http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
- http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
- http://jeppesn.dk/utf-8.html
- http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
- http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
- http://www.utf8-chartable.de/
Important Note
[tag:MySQL] supports the Basic Multilingual Plane using 3-byte UTF-8 characters. If you need to go outside of that (certain alphabets require more than 3-bytes of UTF-8), then you either need to use a flavor of VARBINARY
column type or use the utf8mb4
character set (which requires MySQL 5.5.3 or later). Just be aware that using the utf8
character set in MySQL won't work 100% of the time.
Tomcat with Apache
One more thing If you are using Apache + Tomcat + mod_JK connector then you also need to do following changes:
- Add URIEncoding="UTF-8" into tomcat server.xml file for 8009 connector, it is used by mod_JK connector.
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8"/>
- Goto your apache folder i.e.
/etc/httpd/conf
and addAddDefaultCharset utf-8
inhttpd.conf file
. Note: First check that it is exist or not. If exist you may update it with this line. You can add this line at bottom also.
Solution 2 - Java
I think you summed it up quite well in your own answer.
In the process of UTF-8-ing(?) from end to end you might also want to make sure java itself is using UTF-8. Use -Dfile.encoding=utf-8 as parameter to the JVM (can be configured in catalina.bat).
Solution 3 - Java
To add to kosoant's answer, if you are using Spring, rather than writing your own Servlet filter, you can use the class org.springframework.web.filter.CharacterEncodingFilter
they provide, configuring it like the following in your web.xml:
<filter>
<filter-name>encoding-filter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>FALSE</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encoding-filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Solution 4 - Java
I want also to add from here this part solved my utf problem:
runtime.encoding=<encoding>
Solution 5 - Java
This is for Greek Encoding in MySql tables when we want to access them using Java:
Use the following connection setup in your JBoss connection pool (mysql-ds.xml)
<connection-url>jdbc:mysql://192.168.10.123:3308/mydatabase</connection-url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<user-name>nts</user-name>
<password>xaxaxa!</password>
<connection-property name="useUnicode">true</connection-property>
<connection-property name="characterEncoding">greek</connection-property>
If you don't want to put this in a JNDI connection pool, you can configure it as a JDBC-url like the next line illustrates:
jdbc:mysql://192.168.10.123:3308/mydatabase?characterEncoding=greek
For me and Nick, so we never forget it and waste time anymore.....
Solution 6 - Java
Nice detailed answer. just wanted to add one more thing which will definitely help others to see the UTF-8 encoding on URLs in action .
Follow the steps below to enable UTF-8 encoding on URLs in firefox.
-
type "about:config" in the address bar.
-
Use the filter input type to search for "network.standard-url.encode-query-utf8" property.
-
the above property will be false by default, turn that to TRUE.
-
restart the browser.
UTF-8 encoding on URLs works by default in IE6/7/8 and chrome.
Solution 7 - Java
Previous responses didn't work with my problem. It was only in production, with tomcat and apache mod_proxy_ajp. Post body lost non ascii chars by ? The problem finally was with JVM defaultCharset (US-ASCII in a default instalation: Charset dfset = Charset.defaultCharset();) so, the solution was run tomcat server with a modifier to run the JVM with UTF-8 as default charset:
JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8"
(add this line to catalina.sh and service tomcat restart)
Maybe you must also change linux system variable (edit ~/.bashrc and ~/.profile for permanent change, see https://perlgeek.de/en/article/set-up-a-clean-utf8-environment)
> export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
>
> export LANGUAGE=en_US.UTF-8
Solution 8 - Java
I'm with a similar problem, but, in filenames of a file I'm compressing with apache commons. So, i resolved it with this command:
convmv --notest -f cp1252 -t utf8 * -r
it works very well for me. Hope it help anyone ;)
Solution 9 - Java
For my case of displaying Unicode character from message bundles, I don't need to apply "JSP page encoding" section to display Unicode on my jsp page. All I need is "CharsetFilter" section.
Solution 10 - Java
One other point that hasn't been mentioned relates to Java Servlets working with Ajax. I have situations where a web page is picking up utf-8 text from the user sending this to a JavaScript file which includes it in a URI sent to the Servlet. The Servlet queries a database, captures the result and returns it as XML to the JavaScript file which formats it and inserts the formatted response into the original web page.
In one web app I was following an early Ajax book's instructions for wrapping up the JavaScript in constructing the URI. The example in the book used the escape() method, which I discovered (the hard way) is wrong. For utf-8 you must use encodeURIComponent().
Few people seem to roll their own Ajax these days, but I thought I might as well add this.
Solution 11 - Java
About CharsetFilter
mentioned in @kosoant answer ....
There is a build in Filter
in tomcat web.xml
(located at conf/web.xml
). The filter is named setCharacterEncodingFilter
and is commented by default. You can uncomment this ( Please remember to uncomment its filter-mapping
too )
Also there is no need to set jsp-config
in your web.xml
(I have test it for Tomcat 7+ )
Solution 12 - Java
Some time you can solve problem through MySQL Administrator wizard. In
> Startup variables > Advanced >
and set Def. char Set:utf8
Maybe this config need restart MySQL.
Solution 13 - Java
Faced the same issue on Spring MVC 5 + Tomcat 9 + JSP.
After the long research, came to an elegant solution (no need filters and no need changes in the Tomcat server.xml (starting from 8.0.0-RC3 version))
-
In the WebMvcConfigurer implementation set default encoding for messageSource (for reading data from messages source files in the UTF-8 encoding.
@Configuration @EnableWebMvc @ComponentScan("{package.with.components}") public class WebApplicationContextConfig implements WebMvcConfigurer { @Bean public MessageSource messageSource() { final ResourceBundleMessageSource messageSource = new ResourceBundleMessageSource(); messageSource.setBasenames("messages"); messageSource.setDefaultEncoding("UTF-8"); return messageSource; } /* other beans and methods */ }
-
In the DispatcherServletInitializer implementation @Override the onStartup method and set request and resource character encoding in it.
public class DispatcherServletInitializer extends AbstractAnnotationConfigDispatcherServletInitializer { @Override public void onStartup(final ServletContext servletContext) throws ServletException { // https://wiki.apache.org/tomcat/FAQ/CharacterEncoding servletContext.setRequestCharacterEncoding("UTF-8"); servletContext.setResponseCharacterEncoding("UTF-8"); super.onStartup(servletContext); } /* servlet mappings, root and web application configs, other methods */ }
-
Save all message source and view files in UTF-8 encoding.
-
Add <%@ page contentType="text/html;charset=UTF-8" %> or <%@ page pageEncoding="UTF-8" %> in each *.jsp file or add jsp-config descriptor to web.xml
<?xml version="1.0" encoding="UTF-8"?> <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" id="WebApp_ID" version="3.0"> <display-name>AppName</display-name> <jsp-config> <jsp-property-group> <url-pattern>*.jsp</url-pattern> <page-encoding>UTF-8</page-encoding> </jsp-property-group> </jsp-config> </web-app>
Solution 14 - Java
In case you have specified in connection pool (mysql-ds.xml), in your Java code you can open the connection as follows:
DriverManager.registerDriver(new com.mysql.jdbc.Driver());
Connection conn = DriverManager.getConnection(
"jdbc:mysql://192.168.1.12:3308/mydb?characterEncoding=greek",
"Myuser", "mypass");