Locale in Linux Systems

  • Post author:
  • Post last modified:February 23, 2025
  • Reading time:6 mins read

1.0 Locale

A locale is a collection of language and cultural rules. A locale name is of the form,

language[_territory][.codeset]

language is an ISO 639 language code, territory is an ISO 3166 country code and the codeset is a character set encoding identifier like UTF-8.

2.0 The locale command

The locale command, without any argument, prints the values associated with the current locale.

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The values associated with parameters starting with “LC_” are “categories” of a locale.
LC_CTYPE is for character classification, encoding and case conversion rules.
LC_NUMERIC is for the format for printing numbers, the decimal character.
LC_TIME is for date and time formats.
LC_COLLATE is for string collation order. That is, how characters are ordered for string comparison and sorting operations.
LC_MONETARY is for currency symbol and format for printing monetary amounts.
LC_MESSAGES is for messages between the user and the system. These character encoding for messages and “yes” and “no” messages.
LC_PAPER if for standard paper size used.
LC_NAME is for the name format. It has fields like honorific, title, middle name, first name, etc.
LC_ADDRESS is for the address format.
LC_TELEPHONE is for the format of the telephone number. It includes the formats for domestic and international numbers, the prefix for international calling, etc.
LC_MEASUREMENT is for measurement format.
LC_IDENTIFICATION is for the identification metadata about the locale itself.
LC_ALL is the “overall” locale category. Its value overrides the values of other categories.

2.1 All available locales

The –all-locales, or -a option lists all the locales available in the system.

$ locale -a | more
C
C.utf8
en_AG
en_AG.utf8
en_AU.utf8
...
POSIX

The -v option adds the LC_IDENTIFICATION metadata for each locale.

$ locale -av | more
locale: en_AG           archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
    title | English language locale for Antigua and Barbuda
   source | Free Software Foundation, Inc.
  address | https://www.gnu.org/software/libc/
    email | bug-glibc-locales@gnu.org
 language | English
territory | Antigua & Barbuda
 revision | 1.0
     date | 2008-09-16
  codeset | UTF-8

...

2.2 Character Sets

The –charmaps, or the -m option lists all the available character set description files (charmaps). These files are located in the /usr/share/i18n/charmaps directory.

$ locale -m | more
ANSI_X3.110-1983
ANSI_X3.4-1968
...
UTF-8
...

2.3 Queries

The –category-name, or -c option along with a category name value, lists the category name value on a new line. This is followed by a list of all keyword values for that category.
The –keyword-name, or -k option prints the value of that keyword in the format,
keyword=”value”
These two options can be combined. For example,

$ locale -c LC_CTYPE
LC_CTYPE
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
72
86
1
...
$
$ locale -c LC_CTYPE -k | more
LC_CTYPE
ctype-class-names="upper";"lower";"alpha";"digit";"xdigit";"space";"print";"graph";"blank";"cntrl";"punct";"alnum";"combining";"combinin
g_level3"
ctype-map-names="toupper";"tolower";"totitle"
ctype-width=16
ctype-mb-cur-max=6
charmap="UTF-8"
ctype-class-offset=72
ctype-map-offset=86
ctype-indigits_mb-len=1
ctype-indigits0_mb="0"
ctype-indigits1_mb="1"
ctype-indigits2_mb="2"
...
$
$ locale -k charmap
charmap="UTF-8"

3.0 The C and POSIX Locales

The C locale is the default minimalist locale that is available in Linux systems. The C locale uses the ASCII character encoding. It is especially useful if you want strict ASCII sort order and not the natural language rules. For example, when the locale is en_US.UTF-8, the sort program sorts in the dictionary order. However, if the locale is set to C, the sorting is done as per the ASCII character set sequence, putting uppercase ahead of lower case characters.

The POSIX locale is identical to the C locale, and, can be used in its place.

4.0 Changing the locale

The locale settings can be temporarily changed for the current session only by setting and exporting the environment variable LANG. For example, to set the locale as C locale,

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
...
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ export LANG=C
$ locale
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
...
LC_IDENTIFICATION="C"
LC_ALL=C

LC_ALL overrides all other LC_* settings.
So, if we wish to set the locale for just running a command, say, sort a file named, rquote,

$ LC_ALL=C sort rquote >out

To change the locale permanently for the entire system, open the file /etc/default/locale using a text editor and write the value of locale. For example, to set the locale as en_US.UTF-8, add the following line to the file, /etc/default/locale.

LANG=en_US.UTF-8

Karunesh Johri

Software developer, working with C and Linux.