How do you echo a 4-digit Unicode character in Bash?

BashShellUnicodeCharacter Encoding

Bash Problem Overview


I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, 4-digit Unicode character. Two-digit one's are easy. For example, echo -e "\x55", .

In addition to the answers below it should be noted that, obviously, your terminal needs to support Unicode for the output to be what you expect. gnome-terminal does a good job of this, but it isn't necessarily turned on by default.

On macOS's Terminal app Go to Preferences-> Encodings and choose Unicode (UTF-8).

Bash Solutions


Solution 1 - Bash

In UTF-8 it's actually 6 digits (or 3 bytes).

$ printf '\xE2\x98\xA0'

To check how it's encoded by the console, use hexdump:

$ printf ☠ | hexdump
0000000 98e2 00a0                              
0000003

Solution 2 - Bash

% echo -e '\u2620'     # \u takes four hexadecimal digits% echo -e '\U0001f602' # \U takes eight hexadecimal digits
😂

This works in Zsh (I've checked version 4.3) and in Bash 4.2 or newer.

Solution 3 - Bash

So long as your text-editors can cope with Unicode (presumably encoded in UTF-8) you can enter the Unicode code-point directly.

For instance, in the Vim text-editor you would enter insert mode and press Ctrl + V + U and then the code-point number as a 4-digit hexadecimal number (pad with zeros if necessary). So you would type Ctrl + V + U 2 6 2 0. See: What is the easiest way to insert Unicode characters into a document?

At a terminal running Bash you would type CTRL+SHIFT+U and type in the hexadecimal code-point of the character you want. During input your cursor should show an underlined u. The first non-digit you type ends input, and renders the character. So you could be able to print U+2620 in Bash using the following:

echo CTRL+SHIFT+U2620ENTERENTER

(The first enter ends Unicode input, and the second runs the echo command.)

Credit: Ask Ubuntu SE

Solution 4 - Bash

Here's a fully internal Bash implementation, no forking, unlimited size of Unicode characters.

fast_chr() {
    local __octal
    local __char
    printf -v __octal '%03o' $1
    printf -v __char \\$__octal
    REPLY=$__char
}

function unichr {
    local c=$1    # Ordinal of char
    local l=0    # Byte ctr
    local o=63    # Ceiling
    local p=128    # Accum. bits
    local s=''    # Output string

    (( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }

    while (( c > o )); do
        fast_chr $(( t = 0x80 | c & 0x3f ))
        s="$REPLY$s"
        (( c >>= 6, l++, p += o+1, o>>=1 ))
    done

    fast_chr $(( t = p | c ))
    echo -n "$REPLY$s"
}

## test harness
for (( i=0x2500; i<0x2600; i++ )); do
    unichr $i
done

Output was:

─━│┃┄┅┆┇┈┉┊┋┌┍┎┏
┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯
┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏
═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯
╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏
▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯
▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●
◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯
◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿

Solution 5 - Bash

Quick one-liner to convert UTF-8 characters into their 3-byte format:

var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo

or

echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g'  

The output of both is \xE2\x98\xA0, so you can write reversely:

echo $'\xe2\x98\xa0'   # ☠

Solution 6 - Bash

Just put "☠" in your shell script. In the correct locale and on a Unicode-enabled console it'll print just fine:

$ echo$

An ugly "workaround" would be to output the UTF-8 sequence, but that also depends on the encoding used:

$ echo -e '\xE2\x98\xA0'$

Solution 7 - Bash

In bash to print a Unicode character to output use \x,\u or \U (first for 2 digit hex, second for 4 digit hex, third for any length)

echo -e '\U1f602'

I you want to assign it to a variable use $'...' syntax

x=$'\U1f602'
echo $x

Solution 8 - Bash

Here is a list of all unicode emoji's available:

https://en.wikipedia.org/wiki/Emoji#Unicode_blocks

Example:

echo -e "\U1F304"
🌄

For get the ASCII value of this character use hexdump

echo -e "🌄" | hexdump -C

00000000  f0 9f 8c 84 0a                                    |.....|
00000005

And then use the values informed in hex format

echo -e "\xF0\x9F\x8C\x84\x0A"
🌄

Solution 9 - Bash

Any of these three commands will print the character you want in a console, provided the console do accept UTF-8 characters (most current ones do):

echo -e "SKULL AND CROSSBONES (U+2620) \U02620"
echo $'SKULL AND CROSSBONES (U+2620) \U02620'
printf "%b" "SKULL AND CROSSBONES (U+2620) \U02620\n"

SKULL AND CROSSBONES (U+2620) ☠

After, you could copy and paste the actual glyph (image, character) to any (UTF-8 enabled) text editor.

If you need to see how such Unicode Code Point is encoded in UTF-8, use xxd (much better hex viewer than od):

echo $'(U+2620) \U02620' | xxd
0000000: 2855 2b32 3632 3029 20e2 98a0 0a         (U+2620) ....

That means that the UTF8 encoding is: e2 98 a0

Or, in HEX to avoid errors: 0xE2 0x98 0xA0. That is, the values between the space (HEX 20) and the Line-Feed (Hex 0A).

If you want a deep dive into converting numbers to chars: look here to see an article from Greg's wiki (BashFAQ) about ASCII encoding in Bash!

Solution 10 - Bash

I'm using this:

$ echo -e '\u2620'

This is pretty easier than searching a hex representation... I'm using this in my shell scripts. That works on gnome-term and urxvt AFAIK.

Solution 11 - Bash

You may need to encode the code point as octal in order for prompt expansion to correctly decode it.

U+2620 encoded as UTF-8 is E2 98 A0.

So in Bash,

export PS1="\342\230\240"

will make your shell prompt into skull and bones.

Solution 12 - Bash

If you don't mind a Perl one-liner:

$ perl -CS -E 'say "\x{2620}"'

-CS enables UTF-8 decoding on input and UTF-8 encoding on output. -E evaluates the next argument as Perl, with modern features like say enabled. If you don't want a newline at the end, use print instead of say.

Solution 13 - Bash

Sorry for reviving this old question. But when using bash there is a very easy approach to create Unicode codepoints from plain ASCII input, which even does not fork at all:

unicode() { local -n a="$1"; local c; printf -vc '\\U%08x' "$2"; printf -va "$c"; }
unicodes() { local a c; for a; do printf -vc '\\U%08x' "$a"; printf "$c"; done; };

Use it as follows to define certain codepoints

unicode crossbones 0x2620
echo "$crossbones"

or to dump the first 65536 unicode codepoints to stdout (takes less than 2s on my machine. The additional space is to prevent certain characters to flow into each other due to shell's monospace font):

for a in {0..65535}; do unicodes "$a"; printf ' '; done

or to tell a little very typical parent's story (this needs Unicode 2010):

unicodes 0x1F6BC 32 43 32 0x1F62D 32 32 43 32 0x1F37C 32 61 32 0x263A 32 32 43 32 0x1F4A9 10

Explanation:

  • printf '\UXXXXXXXX' prints out any Unicode character
  • printf '\\U%08x' number prints \UXXXXXXXX with the number converted to Hex, this then is fed to another printf to actually print out the Unicode character
  • printf recognizes octal (0oct), hex (0xHEX) and decimal (0 or numbers starting with 1 to 9) as numbers, so you can choose whichever representation fits best
  • printf -v var .. gathers the output of printf into a variable, without fork (which tremendously speeds up things)
  • local variable is there to not pollute the global namespace
  • local -n var=other aliases var to other, such that assignment to var alters other. One interesting part here is, that var is part of the local namespace, while other is part of the global namespace.
    • Please note that there is no such thing as local or global namespace in bash. Variables are kept in the environment, and such are always global. Local just puts away the current value and restores it when the function is left again. Other functions called from within the function with local will still see the "local" value. This is a fundamentally different concept than all the normal scoping rules found in other languages (and what bash does is very powerful but can lead to errors if you are a programmer who is not aware of that).

Solution 14 - Bash

In Bash:

UnicodePointToUtf8()
{
	local x="$1"               # ok if '0x2620'
	x=${x/\\u/0x}              # '\u2620' -> '0x2620'
	x=${x/U+/0x}; x=${x/u+/0x} # 'U-2620' -> '0x2620'
	x=$((x)) # from hex to decimal
	local y=$x n=0
	[ $x -ge 0 ] || return 1
	while [ $y -gt 0 ]; do y=$((y>>1)); n=$((n+1)); done
	if [ $n -le 7 ]; then		# 7
		y=$x
	elif [ $n -le 11 ]; then 	# 5+6
		y="	$(( ((x>> 6)&0x1F)+0xC0 )) \
            $(( (x&0x3F)+0x80 ))" 
	elif [ $n -le 16 ]; then	# 4+6+6
		y="	$(( ((x>>12)&0x0F)+0xE0 )) \
			$(( ((x>> 6)&0x3F)+0x80 )) \
			$(( (x&0x3F)+0x80 ))"
	else                        # 3+6+6+6
		y="	$(( ((x>>18)&0x07)+0xF0 )) \
			$(( ((x>>12)&0x3F)+0x80 )) \
			$(( ((x>> 6)&0x3F)+0x80 )) \
			$(( (x&0x3F)+0x80 ))"
	fi
	printf -v y '\\x%x' $y
	echo -n -e $y
}

# test
for (( i=0x2500; i<0x2600; i++ )); do
    UnicodePointToUtf8 $i
	[ "$(( i+1 & 0x1f ))" != 0 ] || echo ""
done
x='U+2620'
echo "$x -> $(UnicodePointToUtf8 $x)"

Output:

─━│┃┄┅┆┇┈┉┊┋┌┍┎┏┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿
U+2620 ->

Solution 15 - Bash

The printf builtin (just as the coreutils' printf) knows the \u escape sequence which accepts 4-digit Unicode characters:

   \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

Test with Bash 4.2.37(1):

$ printf '\u2620\n'

Solution 16 - Bash

Based on Stack Overflow questions https://stackoverflow.com/questions/2428443 and https://stackoverflow.com/a/15903654/781312:

(octal=$(echo -n ☠ | od -t o1 | head -1 | cut -d' ' -f2- | sed -e 's#\([0-9]\+\) *#\\0\1#g')
echo Octal representation is following $octal
echo -e "$octal")

Output is the following.

Octal representation is following \0342\0230\0240

Solution 17 - Bash

Easy with a Python2/3 one-liner:

$ python -c 'print u"\u2620"'    # python2
$ python3 -c 'print(u"\u2620")'  # python3

Results in:

Solution 18 - Bash

If hex value of unicode character is known

H="2620"
printf "%b" "\u$H"

If the decimal value of a unicode character is known

declare -i U=2*4096+6*256+2*16
printf -vH "%x" $U              # convert to hex
printf "%b" "\u$H"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmasukomiView Question on Stackoverflow
Solution 1 - BashvartecView Answer on Stackoverflow
Solution 2 - BashJulianoView Answer on Stackoverflow
Solution 3 - BashRobMView Answer on Stackoverflow
Solution 4 - BashOrwellophileView Answer on Stackoverflow
Solution 5 - BashDavid KingView Answer on Stackoverflow
Solution 6 - BashJoachim SauerView Answer on Stackoverflow
Solution 7 - Bashuser2622016View Answer on Stackoverflow
Solution 8 - BashMatheusView Answer on Stackoverflow
Solution 9 - Bashuser2350426View Answer on Stackoverflow
Solution 10 - BashMetal3dView Answer on Stackoverflow
Solution 11 - BashcmsView Answer on Stackoverflow
Solution 12 - BashFlimmView Answer on Stackoverflow
Solution 13 - BashTinoView Answer on Stackoverflow
Solution 14 - BashDmitryView Answer on Stackoverflow
Solution 15 - BashMichael JarosView Answer on Stackoverflow
Solution 16 - Bashtest30View Answer on Stackoverflow
Solution 17 - BashChris JohnsonView Answer on Stackoverflow
Solution 18 - BashphilcolbournView Answer on Stackoverflow