How to print Unicode character in C++?

C++UnicodeIostreamCoutWchar T

C++ Problem Overview


I am trying to print a Russian "ф" (U+0444 CYRILLIC SMALL LETTER EF) character, which is given a code of decimal 1092. Using C++, how can I print out this character? I would have thought something along the lines of the following would work, yet...

int main (){
   wchar_t f = '1060';
   cout << f << endl;
}

C++ Solutions


Solution 1 - C++

To represent the character you can use Universal Character Names (UCNs). The character 'ф' has the Unicode value U+0444 and so in C++ you could write it '\u0444' or '\U00000444'. Also if the source code encoding supports this character then you can just write it literally in your source code.

// both of these assume that the character can be represented with
// a single char in the execution encoding
char b = '\u0444';
char a = 'ф'; // this line additionally assumes that the source character encoding supports this character

Printing such characters out depends on what you're printing to. If you're printing to a Unix terminal emulator, the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do the following:

#include <iostream>

int main() {
    std::cout << "Hello, ф or \u0444!\n";
}

This program does not require that 'ф' can be represented in a single char. On OS X and most any modern Linux install this will work just fine, because the source, execution, and console encodings will all be UTF-8 (which supports all Unicode characters).

Things are harder with Windows and there are different possibilities with different tradeoffs.

Probably the best, if you don't need portable code (you'll be using wchar_t, which should really be avoided on every other platform), is to set the mode of the output file handle to take only UTF-16 data.

#include <iostream>
#include <io.h>
#include <fcntl.h>

int main() {
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout << L"Hello, \u0444!\n";
}

Portable code is more difficult.

Solution 2 - C++

When compiling with -std=c++11, one can simply

  const char *s  = u8"\u0444";
  cout << s << endl;

Solution 3 - C++

Ultimately, this is completely platform-dependent. Unicode-support is, unfortunately, very poor in Standard C++. For GCC, you will have to make it a narrow string, as they use UTF-8, and Windows wants a wide string, and you must output to wcout.

// GCC
std::cout << "ф";
// Windoze
wcout << L"ф";

Solution 4 - C++

If you use Windows (note, we are using printf(), not cout):

//Save As UTF8 without signature
#include <stdio.h>
#include<windows.h>
int main (){
	SetConsoleOutputCP(65001); 
    printf("ф\n");
}

Not Unicode but working - 1251 instead of UTF8:

//Save As Windows 1251
#include <iostream>
#include<windows.h>
using namespace std;
int main (){
	SetConsoleOutputCP(1251); 
    cout << "ф" << endl;
}

Solution 5 - C++

This code works in Linux (C++11, geany, g++ 7.4.0):

#include <iostream>

using namespace std;


int utf8_to_unicode(string utf8_code);
string unicode_to_utf8(int unicode);


int main()
{
	cout << unicode_to_utf8(36) << '\t';
	cout << unicode_to_utf8(162) << '\t';
	cout << unicode_to_utf8(8364) << '\t';
	cout << unicode_to_utf8(128578) << endl;
	
	cout << unicode_to_utf8(0x24) << '\t';
	cout << unicode_to_utf8(0xa2) << '\t';
	cout << unicode_to_utf8(0x20ac) << '\t';
	cout << unicode_to_utf8(0x1f642) << endl;
	
	cout << utf8_to_unicode("$") << '\t';
	cout << utf8_to_unicode("¢") << '\t';
	cout << utf8_to_unicode("€") << '\t';
	cout << utf8_to_unicode("🙂") << endl;
	
	cout << utf8_to_unicode("\x24") << '\t';
	cout << utf8_to_unicode("\xc2\xa2") << '\t';
	cout << utf8_to_unicode("\xe2\x82\xac") << '\t';
	cout << utf8_to_unicode("\xf0\x9f\x99\x82") << endl;
	
	return 0;
}


int utf8_to_unicode(string utf8_code)
{
	unsigned utf8_size = utf8_code.length();
	int unicode = 0;
	
	for (unsigned p=0; p<utf8_size; ++p)
	{
		int bit_count = (p? 6: 8 - utf8_size - (utf8_size == 1? 0: 1)),
		    shift = (p < utf8_size - 1? (6*(utf8_size - p - 1)): 0);
		
		for (int k=0; k<bit_count; ++k)
			unicode += ((utf8_code[p] & (1 << k)) << shift);
	}
	
	return unicode;
}


string unicode_to_utf8(int unicode)
{
	string s;
	
	if (unicode>=0 and unicode <= 0x7f)  // 7F(16) = 127(10)
	{
		s = static_cast<char>(unicode);
		
		return s;
	}
	else if (unicode <= 0x7ff)  // 7FF(16) = 2047(10)
	{
		unsigned char c1 = 192, c2 = 128;
		
		for (int k=0; k<11; ++k)
		{
			if (k < 6)  c2 |= (unicode % 64) & (1 << k);
			else c1 |= (unicode >> 6) & (1 << (k - 6));
		}
		
		s = c1;    s += c2;

		return s;
	}
	else if (unicode <= 0xffff)  // FFFF(16) = 65535(10)
	{
		unsigned char c1 = 224, c2 = 128, c3 = 128;
		
		for (int k=0; k<16; ++k)
		{
			if (k < 6)  c3 |= (unicode % 64) & (1 << k);
			else if (k < 12) c2 |= (unicode >> 6) & (1 << (k - 6));
			else c1 |= (unicode >> 12) & (1 << (k - 12));
		}
		
		s = c1;    s += c2;    s += c3;

		return s;
	}
	else if (unicode <= 0x1fffff)  // 1FFFFF(16) = 2097151(10)
	{
		unsigned char c1 = 240, c2 = 128, c3 = 128, c4 = 128;
		
		for (int k=0; k<21; ++k)
		{
			if (k < 6)  c4 |= (unicode % 64) & (1 << k);
			else if (k < 12) c3 |= (unicode >> 6) & (1 << (k - 6));
			else if (k < 18) c2 |= (unicode >> 12) & (1 << (k - 12));
			else c1 |= (unicode >> 18) & (1 << (k - 18));
		}
		
		s = c1;    s += c2;    s += c3;    s += c4;

		return s;
	}
	else if (unicode <= 0x3ffffff)  // 3FFFFFF(16) = 67108863(10)
	{
		;  // actually, there are no 5-bytes unicodes
	}
	else if (unicode <= 0x7fffffff)  // 7FFFFFFF(16) = 2147483647(10)
	{
		;  // actually, there are no 6-bytes unicodes
	}
	else  ;  // incorrect unicode (< 0 or > 2147483647)
	
	return "";
}

More:

Solution 6 - C++

'1060' is four characters, and won't compile under the standard. You should just treat the character as a number, if your wide characters match 1:1 with Unicode (check your locale settings).

int main (){
    wchar_t f = 1060;
    wcout << f << endl;
}

Solution 7 - C++

I needed to show the string in UI as well as save that to an xml configuration file. The above specified format is good for string in c++, I would add we can have the xml compatible string for the special character by replacing "\u" by "&#x" and adding a ";" at the end.

For example : C++ : "\u0444" --> XML : "&#x0444;"

Solution 8 - C++

In Linux, I can just do:

std::cout << "ф";

I just copy-pasted characters from here and it didn't fail for at least the random sample that I tried on.

Solution 9 - C++

Another solution in Linux:

string a = "Ф";
cout << "Ф = \xd0\xa4 = " << hex
     << int(static_cast<unsigned char>(a[0]))
     << int(static_cast<unsigned char>(a[1])) << " (" << a.length() << "B)" << endl;

string b = "√";
cout << "√ = \xe2\x88\x9a = " << hex
     << int(static_cast<unsigned char>(b[0]))
     << int(static_cast<unsigned char>(b[1]))
     << int(static_cast<unsigned char>(b[2])) << " (" << b.length() << "B)" << endl;

Solution 10 - C++

Special thanks to the answer here for more-or-less the same question.

For me, all I needed was setlocale(LC_ALL, "en_US.UTF-8");

Then, I could use even raw wchar_t characters.

Solution 11 - C++

On Linux, Unicode character (UTF-16 / UTF-32) can be converted to UTF-8 and printed to std::cout. I used these functions.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJames RaitsevView Question on Stackoverflow
Solution 1 - C++bames53View Answer on Stackoverflow
Solution 2 - C++James RaitsevView Answer on Stackoverflow
Solution 3 - C++PuppyView Answer on Stackoverflow
Solution 4 - C++vladasimovicView Answer on Stackoverflow
Solution 5 - C++IroView Answer on Stackoverflow
Solution 6 - C++Mike DeSimoneView Answer on Stackoverflow
Solution 7 - C++MGRView Answer on Stackoverflow
Solution 8 - C++quantaView Answer on Stackoverflow
Solution 9 - C++VoyciecHView Answer on Stackoverflow
Solution 10 - C++AndrewView Answer on Stackoverflow
Solution 11 - C++FlaviuView Answer on Stackoverflow