Explain the effects of export LANG, LC_CTYPE, and LC_ALL

Linux

Linux Problem Overview


I've just installed Linux Mint 17 and faced a problem that I couldn't use the Russian language in the terminal. (I see ? instead of letters.)

On one forum I found this solution:

> Added in ~/.profile:

export LANG=ru_RU.UTF-8
export LC_CTYPE=ru_RU.UTF-8
export LC_ALL=ru_RU.UTF-8

It helped, but also changed my interface language to Russian (which I didn't want). That's not even a problem, but anyway, I would like to know, how this code works (every line).

Linux Solutions


Solution 1 - Linux

I'll explain with detail:

export LANG=ru_RU.UTF-8

That is a shell command that will export an environment variable named LANG with the given value ru_RU.UTF-8. That instructs internationalized programs to use the Russian language (ru), variant from Russia (RU), and the UTF-8 encoding for console output.

Generally this single line is enough.

This other one:

export LC_CTYPE=ru_RU.UTF-8

Does a similar thing, but it tells the program not to change the language, but only the CTYPE to Russian. If a program can change a text to uppercase, then it will use the Russian rules to do so, even though the text itself may be in English.

It is worth saying that mixing LANG and LC_CTYPE can give unexpected results, because few people do that, so it is quite untested, unless maybe:

export LANG=ru_RU.UTF-8
export LC_CTYPE=C

That will make the program output in Russian, but the CTYPE standard old C style.

The last line, LC_ALL is a last resort override, that will make the program ignore all the other LC_* variables and use this. I think that you should never write it in a profile line, but use it to run a program in a given language. For example, if you want to write a bug report, and you don't want any kind of localized output, and you don't know which LC_* variables are set:

LC_ALL=C program

About changing the language of all your programs or only the console, that depends on where you put these lines. I put mine in ~/.bashrc so they don't apply to the GUI, only to the bash consoles.

Solution 2 - Linux

See at the Environment Variables of UNIX Specification page:

> - LANG This variable determines the locale category for native language, > local customs and coded character set in the absence of the LC_ALL and > other LC_* (LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, > LC_NUMERIC, LC_TIME) environment variables. This can be used by > applications to determine the language to use for error messages and > instructions, collating sequences, date formats, and so forth. > > - LC_ALL This variable determines the values for all locale categories. > The value of the LC_ALL environment variable has precedence over any > of the other environment variables starting with LC_ (LC_COLLATE, > LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG > environment variable. > > - LC_CTYPE This variable determines the locale category for character > handling functions, such as tolower(), toupper() and isalpha(). This > environment variable determines the interpretation of sequences of > bytes of text data as characters (for example, single- as opposed to > multi-byte characters), the classification of characters (for example, > alpha, digit, graph) and the behaviour of character classes. > Additional semantics of this variable, if any, are > implementation-dependent.

Solution 3 - Linux

The LANG, LC_CTYPE and LC_ALL are special environment variables which after they got exported to the shell environment (help export), they are available and ready to be read by certain programs which supports a locale (natural language formatting for C).

Each variable sets the C library's notion of natural language formatting style for particular sets of routines, for example:

> - LC_ALL - Set the entire locale generically > - LC_CTYPE - Set a locale for the ctype and multibyte functions. This controls recognition of upper and lower case, alphabetic or non-alphabetic characters, and so on.

and other such as LC_COLLATE (for string collation routines), LC_MESSAGES (for message catalogs), LC_MONETARY (for formatting monetary values), LC_NUMERIC (for formatting numbers), LC_TIME (for formatting dates and times).

Regarding LANG, it is used as a substitute for any unset LC_* variable (see: man locale).

See: man setlocale (BSD), man locale

So when certain C functions are called (such as setlocale, ctype, multibyte, catopen, printf, etc.), they read the locale settings from the configuration files and local environment in order to control and format natural language formatting style as per C programming language standards (see: ISO C99)

See also: C Library - <locale.h>.

Solution 4 - Linux

export is confusing. It really means mark-for-export.

It implies child processes will later be created, and that's when the actual exporting will be done.


The export order of events is: 1-ASSIGN, MARK, and ... 2-FORK.


1) Create a new local shell variable, assign the value to it, and mark this variable for later export.

2) Then if and when, the current shell script is FORKED, (i.e. to create and run any child-processes), then start a child process with a COPY of this exported variable, as one of it's many environment variables.

nb (note well): Not until step 2, and possibly long after the export declaration was issued, does the variable actually get exported. So: export only marks LANG. It does not export LANG.


By convention, exported variables are named in upper case.

Because LANG is only a copy, if the child later modifies this variable, it only modifies it for itself. The parent doesn't see the child's modifications.

Note that there are also many other environment variables passed to child processes from parent processes. These include all of the other environment variables that the parent process also gets from it's parent.

So the child inherits all of the parent's environment variables,

  • any additional ones that the parent marks for export,
  • less any variables which are explicitly unset.

In other words, we have two processes to think about: the parent process and any future child process(es).

The process you're running, in this case profile, is what we're calling the 'parent process'.

profile can spawn one or more child processes, like for example if one of the things you do in profile is to run a program. That program is then (normally) run as a child process of profile. (This is not true if the file is sourced in profile, using the . <name> or source <name> notation, where what is sourced runs in the same process as profile.)


> export LANG=ru_RU.UTF-8 > export LC_CTYPE=ru_RU.UTF-8 > export LC_ALL=ru_RU.UTF-8

So now let's look at the effects of these three environment variables.

LANG is what a user normally sets to affect the language that a program runs in. When in terminal if you enter env | grep LANG you should see that LANG is set to your <language>_<country-code>.<character-encoding>, e.g. LANG=en_US.UTF-8.

LC_CTYPE is an override to LANG, and overrides just the character set used. All other features (categories) of LANG are still used as set by LANG, e.g. LC_TELEPHONE.

LC_ALL is a further override. It overrides both LC_CTYPE and all locale categories that were set by LANG to a given language and codeset. Note that LC_ALL should never be set persistently, like for a profile itself. It is intended only as a temporarily entire locale override, i.e. it overrides all categories, like LC_TELEPHONE, LC_MONETARY, LC_CTYPE, etc.

Solution 5 - Linux

Your .bashrc file is one of the first file to be read, it contains various configurations for your shell session.

From What's the difference between .bashrc, .bash_profile, and .environment?:

> .bashrc is only read by a shell that's both interactive and non-login

As explained in Defining a variable with or without export:

> export makes the variable available to sub-processes.

or

> Specifically export makes the variable available to child processes via the environment.

Moar

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMaxim ZagoruykoView Question on Stackoverflow
Solution 1 - LinuxrodrigoView Answer on Stackoverflow
Solution 2 - LinuxJensView Answer on Stackoverflow
Solution 3 - LinuxkenorbView Answer on Stackoverflow
Solution 4 - LinuxElliptical viewView Answer on Stackoverflow
Solution 5 - LinuxÉdouard LopezView Answer on Stackoverflow