- How to get terminal’s Character Encoding
- 7 Answers 7
- How to enable UTF-8 in Linux? [SOLVED]
- How to enable UTF-8 Redhat Based OS
- Step-1: Show current UTF-8 settings
- Step-2: Show the list of available locales
- Step-3: Change UTF-8 setting
- How to enable UTF-8 Debian Based OS
- Step-1: Show current UTF-8 settings
- Step-2: Show the list of available locales
- Step-3: Change UTF-8 setting
- How to set up a clean UTF-8 environment in Linux
- Choosing an encoding
- Locales: installing
- Locales: generation
- Locales: configuration
- A Warning about Non-Interactive Processes
- Locales: check
- Setting up the terminal emulator
- Testing the terminal emulator
- SSH
- Screen
- Irssi
How to get terminal’s Character Encoding
Now I change my gnome-terminal’s character encoding to «GBK» (default it is UTF-8), but how can I get the value(character encoding) in my Linux?
7 Answers 7
The terminal uses environment variables to determine which character set to use, therefore you can determine it by looking at those variables:
These environment variables are used by applications that are using the terminal for I/O. The terminal emulator itself has no knowledge of them whatsoever, and its currently effective character encoding is a setting somewhere within the emulator program (a data member inside a libvte class in the case of GNOME Terminal).
the ordering of variables suggested here is not good. a more complete solution would be something like: echo $>>. then again, the variable being set isn’t a guarantee that they’re valid, so you should stick to the locale program (as seen in other answers here).
As @JdeBP said, the terminal does not use the locale environment variables to determine its encoding. The terminal can however let applications that interact it know its encoding by setting the locale environment variables. For instance, on macOS you can choose the terminal encoding and optionally set the locale environment variables at terminal startup in Terminal > Preferences > Profiles > Advanced .
locale command with no arguments will print the values of all of the relevant environment variables except for LANGUAGE.
This is what worked for me on a CentOS system. It showed the system encoding based upon current language settings. The terminal settings used to get to that machine are a different story and a function of the client being used.
Check encoding and language:
$ echo $LC_CTYPE ISO-8859-1 $ echo $LANG pt_BR
$ export LC_ALL=pt_PT.utf8 $ export LANG="$LC_ALL"
python -c "import sys; print(sys.stdout.encoding)"
Circumstantial indications from $LC_CTYPE , locale and such might seem alluring, but these are completely separated from the encoding the terminal application (actually an emulator) happens to be using when displaying characters on the screen.
They only way to detect encoding for sure is to output something only present in the encoding, e.g. ä , take a screenshot, analyze that image and check if the output character is correct.
So no, it’s not possible, sadly.
To see the current locale information use locale command. Below is an example on RHEL 7.8
[usr@host ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Examination of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html, the xterm control character documentation, shows that it follows the ISO 2022 standard for character set switching. In particular ESC % G selects UTF-8. So to force the terminal to use UTF-8, this command would need to be sent. I find no way of querying which character set is currently in use, but there are ways of discovering if the terminal supports national replacement character sets.
However, from charsets(7), it doesn’t look like GBK (or GB2312) is an encoding supported by ISO 2022 and xterm doesn’t support it natively. So your best bet might be to use iconv to convert to UTF-8.
Further reading shows that a (significant) subset of GBK is EUC, which is a ISO2022 code, so ISO2022 capable terminals may be able to display GBK natively after all, but I can’t find any mention of activating this programmatically, so the terminal’s user interface would be the only recourse.
How to enable UTF-8 in Linux? [SOLVED]
UTF-8 (Unicode Transformation Format) is an 8-bit Unicode conversion format. It is used to encode Unicode characters in groups of 8-bit variable byte numbers. Character encoding is a way of telling a computer how to interpret raw zeros and ones into real characters. When we write text to a file, the words and sentences we create are made up of different characters, and the characters are arranged in a character set. Or the codes written in a programming language are converted into this format by the system and presented to the user.
For example, in the Mousepad application, UTF-8 is defined as the default encoding:
If you are working on Linux operating system, you should use these converters. If you have received a warning/error regarding UTF-8, read this post. In this article, we will explain the steps to enable UTF-8 on Linux operating systems.
Let’s enable UTF-8 for systems based on major distributions that are used by most users on Linux.
How to enable UTF-8 Redhat Based OS
The following steps can be applied in linux distributions based on Redhat operating system such as Centos, Rocky Linux, AlmaLinux, Fedora, etc.
Step-1: Show current UTF-8 settings
First view the settings current in the system:
[foc@rocky9 ~]$ locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" .
[foc@rocky9 ~]$ localectl System Locale: LANG=en_GB.UTF-8 VC Keymap: us X11 Layout: us
The language you used and the UTF format are also displayed.
Step-2: Show the list of available locales
Use the following command to list available languages and UTF formats:
[foc@rocky9 ~]$ localectl list-locales . en_SC.UTF-8 en_SG.UTF-8 en_US.UTF-8 .
After this command, you will see a long list.
Step-3: Change UTF-8 setting
To change the UTF settings, you can do it by giving the set-locale and LANG parameters to the localectl command. For example:
[foc@rocky9 ~]$ sudo localectl set-locale LANG=en_US.UTF-8
or you can manually edit the /etc/locale.conf file:
[foc@rocky9 ~]$ sudo vi /etc/locale.conf
Change en_GB.UTF-8 to en_US.UTF-8. Then check system locale settings:
[foc@rocky9 ~]$ cat /etc/locale.conf LANG=en_GB.UTF-8
How to enable UTF-8 Debian Based OS
In this step, let’s explain how to make UFT settings in distributions such as Debian based systems, Pardus, Ubuntu, Mint.
Step-1: Show current UTF-8 settings
You can view the UTF settings with the locale command:
foc@ubuntu22:~$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" .
Step-2: Show the list of available locales
Reconfigure the locales package to show the list of available locales:
foc@ubuntu22:~$ sudo dpkg-reconfigure locales
Step-3: Change UTF-8 setting
Run the locales package with root user or sudo:
Select UTF and language from the list. Determine which of the selected settings will be default:
foc@ubuntu22:~$ sudo dpkg-reconfigure locales Generating locales (this might take a while). en_GB.UTF-8. done en_US.UTF-8. done tr_TR.UTF-8. done Generation complete.
The settings have been applied successfully.
How to set up a clean UTF-8 environment in Linux
Many people have problems with handling non-ASCII characters in their programs, or even getting their IRC client or text editor to display them correctly.
To efficiently work with text data, your environment has to be set up properly — it is so much easier to debug a problem which has encoding issues if you can trust your terminal to correctly display correct UTF-8.
I will show you how to set up such a clean environment on Debian Lenny, but most things work independently of the distribution, and parts of it even work on other Unix-flavored operating systems like MacOS X.
Choosing an encoding
In the end the used character encoding doesn’t matter much, as long as it’s a Unicode encoding, i.e. one which can be used to encode all Unicode characters.
UTF-8 is usually a good choice because it efficiently encodes ASCII data too, and the character data I typically deal with still has a high percentage of ASCII chars. It is also used in many places, and thus one can often avoid conversions.
Whatever you do, chose one encoding and stick to it, for your whole system. On Linux that means text files, file names, locales and all text based applications (mutt, slrn, vim, irssi, . ).
For the rest of this article I assume UTF-8, but it should work very similarly for other character encodings.
Locales: installing
Check that you have the locales package installed. On Debian you can do that with.
$ dpkg -l locales Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==============-==============-============================================ ii locales 2.7-18 GNU C Library: National Language (locale) da
The last line is the important one: if it starts with ii , the package is installed, and everything is fine. If not, install it. As root, type
If you get a dialog asking for details, read on to the next section.
Locales: generation
make sure that on your system an UTF-8 locale is generated. As root, type
You’ll see a long list of locales, and you can navigate that list with the up/down arrow keys. Pressing the space bar toggles the locale under the cursor. Make sure to select at least one UTF-8 locale, for example en_US-UTF-8 is usually supported very well. (The first part of the locale name stands for the language, the second for the country or dialect, and the third for the character encoding).
In the next step you have the option to make one of the previously selected locales the default. Picking a default UTF-8 locale as default is usually a good idea, though it might change how some programs work, and thus shouldn’t be done servers hosting sensitive applications.
Locales: configuration
If you chose a default locale in the previous step, log out completely and then log in again. In any case you can configure your per-user environment with environment variables.
The following variables can affect programs: LANG, LANGUAGE, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_PAPER, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION.
Most of the time it works to set all of these to the same value. Instead of setting all LC_ variables separately, you can set the LC_ALL . If you use bash as your shell, you can put these lines in your ~/.bashrc and ~/.profile files:
export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 export LANGUAGE=en_US.UTF-8
To make these changes active in the current shell, source the .bashrc:
All newly started interactive bash processes will respect these settings.
You must restart long-running programs for these changes to take effect.
A Warning about Non-Interactive Processes
There are certain processes that don’t get those environment variables, typically because they are started by some sort of daemon in the background.
Those include processes started from cron, at, init scripts, or indirectly spawned from init scripts, like through a web server.
You might need to take additional steps to ensure that those programs get the proper environment variables.
Locales: check
Run the locale program. The output should be similar to this:
LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8
If not you’ve made a mistake in one of the previous steps, and need to recheck what you did.
Setting up the terminal emulator
Testing the terminal emulator
To test if you terminal emulator works, copy and paste this line in your shell:
perl -Mcharnames=:full -CS -wle 'print "\N"'
This should print a Euro sign € on the console. If it prints a single question mark instead, your fonts might not contain it. Try installing additional fonts. If multiple different (nonsensical) characters are shown, the wrong character encoding is configured. Keep trying :-).
SSH
If you use SSH to log in into another machine, repeat the previous steps, making sure that the locale is set correctly, and that you can view a non-ASCII character like the Euro sign.
Screen
The screen program can work with UTF-8 if you tell it to.
The easiest (and sometimes the only) way is to start it with the -U option:
and also when detaching ( screen -Urd or so).
Inside a running screen you can try Ctrl+a :utf8 on . If that doesn’t work, exit your screen and start a new one with -U
Irssi
There’s a complete guide for setting up irssi to use UTF-8, which partially overlaps with this one. The gist is:
/set term_charset utf-8 /set recode_autodetect_utf8 ON /set recode_fallback ISO-8859-15 /set recode ON