How can I get a Unicode character's code?

JavaUnicodeCharacter

Java Problem Overview


Let's say I have this:

char registered = '®';

or an umlaut, or whatever unicode character. How could I get its code?

Java Solutions


Solution 1 - Java

Just convert it to int:

char registered = '®';
int code = (int) registered;

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as char values in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt().

Once you've got the UTF-16 code unit or Unicode code points, both of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kind of representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+, e.g. "U+0020" for space.) That's beyond the scope of this question though, as we don't know what the requirements are.

Solution 2 - Java

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAt method. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a char can represent.

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char (such as the registered local variable) then it must fall within the \u0000 to \uffff range, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

For example, instead of

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

use

String input = ...;
int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

Solution 3 - Java

In Java, char is technically a "16-bit integer", so you can simply cast it to int and you'll get it's code. From Oracle:

> The char data type is a single 16-bit Unicode character. It has a > minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or > 65,535 inclusive).

So you can simply cast it to int.

char registered = '®';
System.out.println(String.format("This is an int-code: %d", (int) registered));
System.out.println(String.format("And this is an hexa code: %x", (int) registered));

Solution 4 - Java

For me, only "Integer.toHexString(registered)" worked the way I wanted:

char registered = '®';
System.out.println("Answer:"+Integer.toHexString(registered));

This answer will give you only string representations what are usually presented in the tables. Jon Skeet's answer explains more.

Solution 5 - Java

There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:

String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);

For example a String "Hello World" will be converted into

"\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064"

It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The library could be obtained as a Maven artifact or taken from Github (including source code and Javadoc)

Solution 6 - Java

dear friend, Jon Skeet said you can find character Decimal codebut it is not character Hex code as it should mention in unicode, so you should represent character codes via HexCode not in Deciaml.

there is an open source tool at http://unicode.codeplex.com that provides complete information about a characer or a sentece.

so it is better to create a parser that give a char as a parameter and return ahexCode as string

public static String GetHexCode(char character)
    {
        return String.format("{0:X4}", GetDecimal(character));
    }//end

hope it help

Solution 7 - Java

//You can get unicode below

int a = 'a'; // 'a' is a letter or symbol you want to get its unicode

//You can get symbel or letter below by its unicode

System.out.println("\123"); //123 is an unicode you want to transfer

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGeoView Question on Stackoverflow
Solution 1 - JavaJon SkeetView Answer on Stackoverflow
Solution 2 - JavaAndrzej DoyleView Answer on Stackoverflow
Solution 3 - JavaFelypeView Answer on Stackoverflow
Solution 4 - JavaDarius MiliauskasView Answer on Stackoverflow
Solution 5 - JavaMichael GantmanView Answer on Stackoverflow
Solution 6 - JavaNasser HadjlooView Answer on Stackoverflow
Solution 7 - JavaYokubboy YokubovView Answer on Stackoverflow