Linux utf to ansi

Unix command to convert character encoding in a .csv file

I need a Unix command to convert a .csv file that is in UNICODE format to ANSI format. The file is imported from the Cognos environment and I am unable to make any changes with the format in cognos.

2 Answers 2

You can use iconv to convert between encodings

iconv -f utf-8 -t ascii oldfile > newfile 
  • I noticed mine having similar issue but was not able to find what type of encoding it was.
  • Notepad++ tells in my case that it was ANSI so the above command dint help me.
  • Now there are many types of ANSI so the best way to check that was to type
*iconv -l|grep -i ansi ANSI_X3.4-1968// ANSI_X3.4-1986// ANSI_X3.4// ANSI_X3.110-1983// ANSI_X3.110// MS-ANSI//* 

You must log in to answer this question.

Linked

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.12.43529

Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group.
This site is not affiliated with Linus Torvalds or The Open Group in any way.

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

How to Convert Files to UTF-8 Encoding in Linux

In this guide, we will describe what character encoding and cover a few examples of converting files from one character encoding to another using a command line tool. Then finally, we will look at how to convert several files from any character set (charset) to UTF-8 encoding in Linux.

As you may probably have in mind already, a computer does not understand or store letters, numbers or anything else that we as humans can perceive except bits. A bit has only two possible values, that is either a 0 or 1 , true or false , yes or no . Every other thing such as letters, numbers, images must be represented in bits for a computer to process.

Читайте также:  Desktop environment for linux

In simple terms, character encoding is a way of informing a computer how to interpret raw zeroes and ones into actual characters, where a character is represented by set of numbers. When we type text in a file, the words and sentences we form are cooked-up from different characters, and characters are organized into a charset.

There are various encoding schemes out there such as ASCII, ANSI, Unicode among others. Below is an example of ASCII encoding.

Character bits A 01000001 B 01000010

In Linux, the iconv command line tool is used to convert text from one form of encoding to another.

You can check the encoding of a file using the file command, by using the -i or —mime flag which enables printing of mime type string as in the examples below:

$ file -i Car.java $ file -i CarDriver.java

Check File Encoding in Linux

The syntax for using iconv is as follows:

$ iconv option $ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile

Where -f or —from-code means input encoding and -t or —to-encoding specifies output encoding.

To list all known coded character sets, run the command below:

List Coded Charsets in Linux

Convert Files from UTF-8 to ASCII Encoding

Next, we will learn how to convert from one encoding scheme to another. The command below converts from ISO-8859-1 to UTF-8 encoding.

Consider a file named input.file which contains the characters:

Let us start by checking the encoding of the characters in the file and then view the file contents. Closely, we can convert all the characters to ASCII encoding.

After running the iconv command, we then check the contents of the output file and the new encoding of the characters as below.

$ file -i input.file $ cat input.file $ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT input.file -o out.file $ cat out.file $ file -i out.file

Convert UTF-8 to ASCII in Linux

Note: In case the string //IGNORE is added to to-encoding, characters that can’t be converted and an error is displayed after conversion.

Читайте также:  Ftp сервер linux root

Again, supposing the string //TRANSLIT is added to to-encoding as in the example above (ASCII//TRANSLIT), characters being converted are transliterated as needed and if possible. Which implies in the event that a character can’t be represented in the target character set, it can be approximated through one or more similar looking characters.

Consequently, any character that can’t be transliterated and is not in target character set is replaced with a question mark (?) in the output.

Convert Multiple Files to UTF-8 Encoding

Coming back to our main topic, to convert multiple or all files in a directory to UTF-8 encoding, you can write a small shell script called encoding.sh as follows:

#!/bin/bash #enter input encoding here FROM_ENCODING="value_here" #output encoding(UTF-8) TO_ENCODING="UTF-8" #convert CONVERT=" iconv -f $FROM_ENCODING -t $TO_ENCODING" #loop to convert multiple files for file in *.txt; do $CONVERT "$file" -o "$.utf8.converted" done exit 0

Save the file, then make the script executable. Run it from the directory where your files ( *.txt ) are located.

$ chmod +x encoding.sh $ ./encoding.sh

Important: You can as well use this script for general conversion of multiple files from one given encoding to another, simply play around with the values of the FROM_ENCODING and TO_ENCODING variable, not forgetting the output file name «$.utf8.converted» .

For more information, look through the iconv man page.

To sum up this guide, understanding encoding and how to convert from one character encoding scheme to another is necessary knowledge for every computer user more so for programmers when it comes to dealing with text.

Lastly, you can get in touch with us by using the comment section below for any questions or feedback.

Источник

Why can’t I convert a UTF-8 to MS-ANSI using iconv?

That’s a character (U+FEFF, encoded in 3 bytes in UTF-8) which is also used as byte-order-mark. In any case, that character is not found in MS-ANSI (an improper name sometimes given to windows-1252, a superset of iso8859-1) so cannot be converted to that.

BOM are used (at the beginning of some text) to differentiate UTF16-LE from UTF16-BE (or other non-byte encodings affected by CPU endianness). It makes no sense in UTF-8 where there’s no byte order ambiguity, it would make even less sense in windows-1252 which is a single-byte character charset. As a «zero width no-break space», it’s also invisible and has no word-separation property like the «zero width space» character would have, so it’s probably safe to remove it altogether.

Читайте также:  Linux удалить строку до символа

With some iconv implementations, you can also use:

iconv -t windows-1252//translit < input 

//translit resorts to approximations when the text cannot be faithfully translated. In that case, it just removes the U+FEFF character.

$ printf '\xef\xbb\xbf\x38\x3a\x6e\x61\x09\x38\x3a' | iconv -t windows-1252//translit | hd 00000000 38 3a 6e 61 09 38 3a |8:na.8:| 00000007 

Another option could be to use:

iconv -t utf-16le | iconv -f utf-16 -t windows-1252 

The first iconv converts to UTF-16 little-endian without BOM, but that initial U+FEFF makes it actual UTF-16 with BOM, so the second iconv strips that BOM as it's used to determine the byte-order of that utf-16 encoding.

Источник

UNIX for Dummies Questions & Answers

Member Information Avatar

5, 0

i am trying to convert a file which is in UTF8 format to ANSI format i tried to use the function ICONV but it is throwing error

$ iconv -f UTF8 -t ANSI filename

Error iam getting is NOT Supported UTF8 to ANSI

please some help me out on this. Let me know are their any functions in unix to convert from UTF8 to ANSI

Member Information Avatar

2,202, 340

Member Information Avatar

5, 0

sorry i did not get would u please tell in breif

Member Information Avatar

2,202, 340

You first have to create the conversion table that contains the from:to data.
As with most unix man pages, there's an explanation, but not an example.

Member Information Avatar

3,216, 33

i am trying to convert a file which is in UTF8 format to ANSI format i tried to use the function ICONV but it is throwing error

$ iconv -f UTF8 -t ANSI filename

Error iam getting is NOT Supported UTF8 to ANSI

please some help me out on this. Let me know are their any functions in unix to convert from UTF8 to ANSI

You didnt specify which ANSI format.

Источник

Оцените статью
Adblock
detector