1.0 Locale
A locale is a collection of language and cultural rules. A locale name is of the form,
language[_territory][.codeset]
language is an ISO 639 language code, territory is an ISO 3166 country code and the codeset is a character set encoding identifier like UTF-8.
2.0 The locale command
The locale command, without any argument, prints the values associated with the current locale.
$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
The values associated with parameters starting with “LC_” are “categories” of a locale.
LC_CTYPE is for character classification, encoding and case conversion rules.
LC_NUMERIC is for the format for printing numbers, the decimal character.
LC_TIME is for date and time formats.
LC_COLLATE is for string collation order. That is, how characters are ordered for string comparison and sorting operations.
LC_MONETARY is for currency symbol and format for printing monetary amounts.
LC_MESSAGES is for messages between the user and the system. These character encoding for messages and “yes” and “no” messages.
LC_PAPER if for standard paper size used.
LC_NAME is for the name format. It has fields like honorific, title, middle name, first name, etc.
LC_ADDRESS is for the address format.
LC_TELEPHONE is for the format of the telephone number. It includes the formats for domestic and international numbers, the prefix for international calling, etc.
LC_MEASUREMENT is for measurement format.
LC_IDENTIFICATION is for the identification metadata about the locale itself.
LC_ALL is the “overall” locale category. Its value overrides the values of other categories.
2.1 All available locales
The –all-locales, or -a option lists all the locales available in the system.
$ locale -a | more C C.utf8 en_AG en_AG.utf8 en_AU.utf8 ... POSIX
The -v option adds the LC_IDENTIFICATION metadata for each locale.
$ locale -av | more locale: en_AG archive: /usr/lib/locale/locale-archive ------------------------------------------------------------------------------- title | English language locale for Antigua and Barbuda source | Free Software Foundation, Inc. address | https://www.gnu.org/software/libc/ email | bug-glibc-locales@gnu.org language | English territory | Antigua & Barbuda revision | 1.0 date | 2008-09-16 codeset | UTF-8 ...
2.2 Character Sets
The –charmaps, or the -m option lists all the available character set description files (charmaps). These files are located in the /usr/share/i18n/charmaps directory.
$ locale -m | more ANSI_X3.110-1983 ANSI_X3.4-1968 ... UTF-8 ...
2.3 Queries
The –category-name, or -c option along with a category name value, lists the category name value on a new line. This is followed by a list of all keyword values for that category.
The –keyword-name, or -k option prints the value of that keyword in the format,
keyword=”value”
These two options can be combined. For example,
$ locale -c LC_CTYPE LC_CTYPE upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3 toupper;tolower;totitle 16 6 UTF-8 72 86 1 ... $ $ locale -c LC_CTYPE -k | more LC_CTYPE ctype-class-names="upper";"lower";"alpha";"digit";"xdigit";"space";"print";"graph";"blank";"cntrl";"punct";"alnum";"combining";"combinin g_level3" ctype-map-names="toupper";"tolower";"totitle" ctype-width=16 ctype-mb-cur-max=6 charmap="UTF-8" ctype-class-offset=72 ctype-map-offset=86 ctype-indigits_mb-len=1 ctype-indigits0_mb="0" ctype-indigits1_mb="1" ctype-indigits2_mb="2" ... $ $ locale -k charmap charmap="UTF-8"
3.0 The C and POSIX Locales
The C locale is the default minimalist locale that is available in Linux systems. The C locale uses the ASCII character encoding. It is especially useful if you want strict ASCII sort order and not the natural language rules. For example, when the locale is en_US.UTF-8, the sort program sorts in the dictionary order. However, if the locale is set to C, the sorting is done as per the ASCII character set sequence, putting uppercase ahead of lower case characters.
The POSIX locale is identical to the C locale, and, can be used in its place.
4.0 Changing the locale
The locale settings can be temporarily changed for the current session only by setting and exporting the environment variable LANG. For example, to set the locale as C locale,
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
...
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ export LANG=C
$ locale
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
...
LC_IDENTIFICATION="C"
LC_ALL=C
LC_ALL overrides all other LC_* settings.
So, if we wish to set the locale for just running a command, say, sort a file named, rquote,
$ LC_ALL=C sort rquote >out
To change the locale permanently for the entire system, open the file /etc/default/locale using a text editor and write the value of locale. For example, to set the locale as en_US.UTF-8, add the following line to the file, /etc/default/locale.
LANG=en_US.UTF-8