Text to ascii linux

Bash: Convert non-ASCII characters to ASCII

How can I convert a string like Žvaigždės aukštybėj užges or äüöÖÜÄ to Zvaigzdes aukstybej uzges or auoOUA , respectively, using Bash? Basically I just want to convert all characters which aren’t in the Latin alphabet. Thanks

5 Answers 5

Depending on your machine you can try piping your strings through

iconv -f utf-8 -t ascii//translit 

(or whatever your encoding is, if it’s not utf-8)

echo «Aldo Vásquez» | iconv -f utf-8 -t ascii//translit Aldo V’asquez . running this through the app I»m trying to match , it outputs «Aldo_Vasquez» how would I get iconv to do that?

@majorgear, honestly — no idea. This particular case can be handled by something like echo «Aldo Vásquez» | tr ‘ÁáÉé’ ‘AaEe’ , but it’s hardly a solution to write home about…

Thanks. I was trying to replicate the output of an app I was using. Its open source so I’m just going to have to dig into the code to find how it’s doing the conversion.

You might be able to use iconv.

Žvaigždės aukštybėj užges or äüöÖÜÄ

is in file testutf8.txt, utf8 format.

iconv -f UTF8 -t US-ASCII//TRANSLIT testutf8.txt

Zvaigzdes aukstybej uzges or auoOUA

echo Hej på dig, du den dära | iconv -f utf-8 -t us-ascii//TRANSLIT 

You can also use the python library unidecode to perform so:

$ echo "Žvaigždės aukštybėj užges äüöÖÜÄ" | unidecode 
Zvaigzdes aukstybej uzges auoOUA 

See this post for other approaches.

 try < String name = "Žvaigždės aukštybėj užges "; String s1 = Normalizer.normalize(name, Normalizer.Form.NFKD); String regex = "[\\p\\p\\p]+"; String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii"); > catch (UnsupportedEncodingException e)

Easy to criticise this, but a newbie took the effort, got shouted down and has now left SO. [Slow clap..] And what would we do without iconv?

@geotheory: . and it’s not like the other answers are pure bash, either. They all rely on an external executable. All this answer really needs is instructions to compile the java file and run it from bash.

Источник

Linux Fun – How to Create ASCII Text Banners in Terminal

Recently, we have explained about how to randomly display predefined ASCII art on the Linux terminal, using a program called ASCII-Art-Splash-Screen. In this article, we will show how to create your own appealing ASCII text banners from plain text, using two command-line utilities called FIGlet and TOIlet.

FIGlet is a simple command-line utility for creating ASCII text banners or large letters out of ordinary text, whereas TOIlet (a sub-command under figlet) is a command-line utility for creating colorful large characters from ordinary text.

How to Install and Use Figlet and Toilet Tools in Linux

To use FIGlet and TOIlet tools together, you first need to install them on your Linux system using default package manager as shown.

$ sudo apt install figlet toilet [On Debian/Ubuntu] $ sudo yum install figlet toilet [On CentOS/RHEL] $ sudo dnf install figlet toilet [On Fedora 22+]

Once installed, the basic way of using figlet is by providing as an argument, the text that you want to transform as a banner or large text, as shown.

$ figlet TecMint.com _____ __ __ _ _ |_ _|__ ___| \/ (_)_ __ | |_ ___ ___ _ __ ___ | |/ _ \/ __| |\/| | | '_ \| __| / __/ _ \| '_ ` _ \ | | __/ (__| | | | | | | | |_ | (_| (_) | | | | | | |_|\___|\___|_| |_|_|_| |_|\__(_)___\___/|_| |_| |_| 

Set Output Justification

If you want the output to be created at the center, use the -c flag as shown.

$ figlet -c TecMint.com _____ __ __ _ _ |_ _|__ ___| \/ (_)_ __ | |_ ___ ___ _ __ ___ | |/ _ \/ __| |\/| | | '_ \| __| / __/ _ \| '_ ` _ \ | | __/ (__| | | | | | | | |_ | (_| (_) | | | | | | |_|\___|\___|_| |_|_|_| |_|\__(_)___\___/|_| |_| |_| 

In addition, also use -l to set the output to the left or -r to print it to the right.

Читайте также:  Программиста для linux mint

Define Output Width

You can also control the output width with the -w switch, the default width is 80 columns.

$ figlet -w 100 I Love TecMint.com ___ _ _____ __ __ _ _ |_ _| | | _____ _____ |_ _|__ ___| \/ (_)_ __ | |_ ___ ___ _ __ ___ | | | | / _ \ \ / / _ \ | |/ _ \/ __| |\/| | | '_ \| __| / __/ _ \| '_ ` _ \ | | | |__| (_) \ V / __/ | | __/ (__| | | | | | | | |_ | (_| (_) | | | | | | |___| |_____\___/ \_/ \___| |_|\___|\___|_| |_|_|_| |_|\__(_)___\___/|_| |_| |_| 

If you have a wider terminal, you can use the full width of your terminal with the -t switch.

Add Space Between Output Characters

For a more clear output, you can use the -k flag to add a little space between the printed characters: check out the different between the above and below output as shown.

$ figlet -t -k I Love TecMint.com ___ _ _____ __ __ _ _ |_ _| | | ___ __ __ ___ |_ _|___ ___ | \/ |(_) _ __ | |_ ___ ___ _ __ ___ | | | | / _ \\ \ / // _ \ | | / _ \ / __|| |\/| || || '_ \ | __| / __|/ _ \ | '_ ` _ \ | | | |___| (_) |\ V /| __/ | || __/| (__ | | | || || | | || |_ _| (__| (_) || | | | | | |___| |_____|\___/ \_/ \___| |_| \___| \___||_| |_||_||_| |_| \__|(_)\___|\___/ |_| |_| |_| 

Read Input From a File

Rather than type your text on the command-line, you can read text from a file, using the -p option as shown.

$ echo "I wish I could chmod 644 my Girlfriend" >girlfriend.txt $ figlet -kp < girlfriend.txt ___ _ _ ___ _ _ |_ _| __ __(_) ___ | |__ |_ _| ___ ___ _ _ | | __| | | | \ \ /\ / /| |/ __|| '_ \ | | / __|/ _ \ | | | || | / _` | | | \ V V / | |\__ \| | | | | | | (__| (_) || |_| || || (_| | |___| \_/\_/ |_||___/|_| |_| |___| \___|\___/ \__,_||_| \__,_| _ _ __ _ _ _ _ ___ | |__ _ __ ___ ___ __| | / /_ | || | | || | / __|| '_ \ | '_ ` _ \ / _ \ / _` | | '_ \ | || |_ | || |_ | (__ | | | || | | | | || (_) || (_| | | (_) ||__ _||__ _| \___||_| |_||_| |_| |_| \___/ \__,_| \___/ |_| |_| ____ _ _ __ _ _ _ __ ___ _ _ / ___|(_) _ __ | | / _| _ __ (_) ___ _ __ __| | | '_ ` _ \ | | | | | | _ | || '__|| || |_ | '__|| | / _ \| '_ \ / _` | | | | | | || |_| | | |_| || || | | || _|| | | || __/| | | || (_| | |_| |_| |_| \__, | \____||_||_| |_||_| |_| |_| \___||_| |_| \__,_| 

Change Output Font

You can specify another font, using the -f flag, font is a .flf or .tlf file stored in /usr/share/figlet. You can check out available fonts like so.

$ ls /usr/share/figlet/ 646-ca2.flc 646-es.flc 646-kr.flc 646-yu.flc 8859-9.flc 646-ca.flc 646-fr.flc 646-no2.flc 8859-2.flc ascii12.tlf 646-cn.flc 646-gb.flc 646-no.flc 8859-3.flc ascii9.tlf 646-cu.flc 646-hu.flc 646-pt2.flc 8859-4.flc banner.flf 646-de.flc 646-irv.flc 646-pt.flc 8859-5.flc bigascii12.tlf 646-dk.flc 646-it.flc 646-se2.flc 8859-7.flc bigascii9.tlf 646-es2.flc 646-jp.flc 646-se.flc 8859-8.flc big.flf

Then use a particular font, for example, I use font slant.tlf as shown.

$ figlet -f slant "Sudo I Love You" _____ __ ____ __ __ __ / ___/__ ______/ /___ / _/ / / ____ _ _____ \ \/ /___ __ __ \__ \/ / / / __ / __ \ / / / / / __ \ | / / _ \ \ / __ \/ / / / ___/ / /_/ / /_/ / /_/ / _/ / / /___/ /_/ / |/ / __/ / / /_/ / /_/ / /____/\__,_/\__,_/\____/ /___/ /_____/\____/|___/\___/ /_/\____/\__,_/ 

Use TOIlet to Create Colored ASCII Text Banners

The toilet command is also used to transform text to large ASCII characters. The simplest way of running it is as follows.

$ toilet TecMint.com mmmmmmm m m " m # mmm mmm ## ## mmm m mm mm#mm mmm mmm mmmmm # #" # #" " # ## # # #" # # #" " #" "# # # # # #"""" # # "" # # # # # # # # # # # # "#mm" "#mm" # # mm#mm # # "mm # "#mm" "#m#" # # # 

To change to a particular font, use the -f option, it also reads fonts from the same source as figlet.

$ toilet -kf script TecMint.com ______ ,__ __ (_) | /| | | o | _ __ | | | _ _ _|_ __ __ _ _ _ _ ||/ / | | | | / |/ | | / / \_/ |/ |/ | (_/ |__/\___/| | |_/|_/ | |_/|_/o\___/\__/ | | |_/ 

A number of the options for figlet that we have looked at above also apply to toilet. For more information, refer to their man pages.

Читайте также:  Firebird connect database linux
Summary

In this article, we looked at two command-line utilities for transforming text to large ASCII text characters, useful for creating banners or messages. Share your thoughts about these commands via the feedback form below.

Источник

How to Convert Files to UTF-8 Encoding in Linux

In this guide, we will describe what character encoding and cover a few examples of converting files from one character encoding to another using a command line tool. Then finally, we will look at how to convert several files from any character set (charset) to UTF-8 encoding in Linux.

As you may probably have in mind already, a computer does not understand or store letters, numbers or anything else that we as humans can perceive except bits. A bit has only two possible values, that is either a 0 or 1 , true or false , yes or no . Every other thing such as letters, numbers, images must be represented in bits for a computer to process.

In simple terms, character encoding is a way of informing a computer how to interpret raw zeroes and ones into actual characters, where a character is represented by set of numbers. When we type text in a file, the words and sentences we form are cooked-up from different characters, and characters are organized into a charset.

There are various encoding schemes out there such as ASCII, ANSI, Unicode among others. Below is an example of ASCII encoding.

Character bits A 01000001 B 01000010

In Linux, the iconv command line tool is used to convert text from one form of encoding to another.

You can check the encoding of a file using the file command, by using the -i or —mime flag which enables printing of mime type string as in the examples below:

$ file -i Car.java $ file -i CarDriver.java

Check File Encoding in Linux

The syntax for using iconv is as follows:

$ iconv option $ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile

Where -f or —from-code means input encoding and -t or —to-encoding specifies output encoding.

Читайте также:  Find file count on linux

To list all known coded character sets, run the command below:

List Coded Charsets in Linux

Convert Files from UTF-8 to ASCII Encoding

Next, we will learn how to convert from one encoding scheme to another. The command below converts from ISO-8859-1 to UTF-8 encoding.

Consider a file named input.file which contains the characters:

Let us start by checking the encoding of the characters in the file and then view the file contents. Closely, we can convert all the characters to ASCII encoding.

After running the iconv command, we then check the contents of the output file and the new encoding of the characters as below.

$ file -i input.file $ cat input.file $ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT input.file -o out.file $ cat out.file $ file -i out.file

Convert UTF-8 to ASCII in Linux

Note: In case the string //IGNORE is added to to-encoding, characters that can’t be converted and an error is displayed after conversion.

Again, supposing the string //TRANSLIT is added to to-encoding as in the example above (ASCII//TRANSLIT), characters being converted are transliterated as needed and if possible. Which implies in the event that a character can’t be represented in the target character set, it can be approximated through one or more similar looking characters.

Consequently, any character that can’t be transliterated and is not in target character set is replaced with a question mark (?) in the output.

Convert Multiple Files to UTF-8 Encoding

Coming back to our main topic, to convert multiple or all files in a directory to UTF-8 encoding, you can write a small shell script called encoding.sh as follows:

#!/bin/bash #enter input encoding here FROM_ENCODING="value_here" #output encoding(UTF-8) TO_ENCODING="UTF-8" #convert CONVERT=" iconv -f $FROM_ENCODING -t $TO_ENCODING" #loop to convert multiple files for file in *.txt; do $CONVERT "$file" -o "$.utf8.converted" done exit 0

Save the file, then make the script executable. Run it from the directory where your files ( *.txt ) are located.

$ chmod +x encoding.sh $ ./encoding.sh

Important: You can as well use this script for general conversion of multiple files from one given encoding to another, simply play around with the values of the FROM_ENCODING and TO_ENCODING variable, not forgetting the output file name «$.utf8.converted» .

For more information, look through the iconv man page.

To sum up this guide, understanding encoding and how to convert from one character encoding scheme to another is necessary knowledge for every computer user more so for programmers when it comes to dealing with text.

Lastly, you can get in touch with us by using the comment section below for any questions or feedback.

Источник

Оцените статью
Adblock
detector