- Convert char to wchar_t using standard library?
- 3 Answers 3
- How to convert ASCII char* to wchar_t* in C++ without using mbstowcs?
- 3 Answers 3
- Convert char* to wchar* in C
- 5 Answers 5
- Linked
- Related
- Hot Network Questions
- Subscribe to RSS
- Конвертация из char в wchar_t на с++ под Linux
- Перевод строк char в wchar_t
- 1 ответ 1
Convert char to wchar_t using standard library?
I have a function that expects a wchar_t array as a parameter.I don’t know of a standard library function to make a conversion from char to wchar_t so I wrote a quick dirty function, but I want a reliable solution free from bugs and undefined behaviors. Does the standard library have a function that makes this conversion ? My code:
wchar_t *ctow(const char *buf, wchar_t *output) < const char ANSI_arr[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()-_=+[]<>\\|;:'\",/? \t\n\r\f"; const wchar_t WIDE_arr[] = L"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()-_=+[]<>\\|;:'\",/? \t\n\r\f"; size_t n = 0, len = strlen(ANSI_arr); while (*buf) < for (size_t x = 0; x < len; x++) < if (*buf == ANSI_arr[x]) < output[n++] = WIDE_arr[x]; break; >> buf++; > output[n] = L'\0'; return output; >
3 Answers 3
Well, conversion functions are declared in stdlib.h (*). But you must know that for any character in latin1 aka ISO-8859-1 charset the conversion to a wide character is a mere assignation, because character of unicode code below 256 are the latin1 characters.
So if your initial charset is ISO-8859-1, the convertion is simply:
wchar_t *ctow(const char *buf, wchar_t *output) < wchar_t cr = output; while (*buf) < *output++ = *buf++; >*output = 0; return cr; >
provided caller passed a pointer to an array of size big enough to store all the converted characters.
If you are using any other charset, you will have to use a well known library like icu, or build one by hand, which is simple for single byte charsets (ISO-8859-x serie), more trikier for multibyte ones like UTF8.
But without knowing the charsets you want to be able to process, I cannot say more.
BTW, plain ascii is a subset of ISO-8859-1 charset.
int mbtowc (wchar_t* pwc, const char* pmb, size_t max);
Convert multibyte sequence to wide character The multibyte character pointed by pmb is converted to a value of type wchar_t and stored at the location pointed by pwc. The function returns the length in bytes of the multibyte character.
mbtowc has its own internal shift state, which is altered as necessary only by calls to this function. A call to the function with a null pointer as pmb resets the state (and returns whether multibyte characters are state-dependent).
The behavior of this function depends on the LC_CTYPE category of the selected C locale.
How to convert ASCII char* to wchar_t* in C++ without using mbstowcs?
I’d like to convert a ASCII char* to wchar_t* in C++ on Linux without using mbstowcs() . On iOS and Windows, this works perfectly. On Android, however, mbstowcs seems to convert things quite literally, one-to-one. Even using different variations of setlocale() , I’ve been unable to successfully convert. I might end up with just manually converting it on Android by copying 1 byte, and filling the rest with zeroes. But is this proper for ASCII? Are the first 255 characters of UTF-32/Unicode the same as the ASCII (ISO 8859-1/ISO Latin-1) character set?
Are the first 255 characters of UTF-32/Unicode the same as the ASCII character set? No, since ASCII only defines the first 128 characters. But they are the same as the first 128 Unicode characters.
My understanding was that ISO 8859-1/ISO Latin-1 covers the ASCII characters from 128 to 255. So, I guess I’m specifically asking, are all the characters of ISO 8859-1/ISO Latin-1 map directly to the first 255 characters of UTF-32/Unicode?
3 Answers 3
To make thinks a bit clearer :
- ASCII is a character encoding using values from 0..127 to encode a single character.
- Latin-1 is another character set, that extends ASCII by using the values from 128..255 to encode its own characters.
Indeed most architecture byte is 8 bits, so there are still 128 values available when storing ASCII characters in byte. Several different character set were thus designed to extend ASCII for values from 128..255. Happy accident, the one referred as Latin-1 was used for the first 256 code points in Unicode (as pointed by BoBTFish). So if you have on one hand string of chars that you know is encoded using Latin-1, you can just assign each value to a wchar_t (which will ensure a correct «zero filling» with regard to endianness on your architecture), and it will be a valid wstring of unicode code points corresponding to the same characters. Then, the consumer of your wstring has to interpret its content as unicode code points.
Also, as soon as you cannot guarantee the encoding of the original string is Latin-1, you will run into problems. (eg, UTF-8 encoding is not mapping byte-per-byte to Latin-1).
Convert char* to wchar* in C
I would like to convert a char* string to a wchar* string in C. I have found many answers, but most of them are for C++. Could you help me? Thanks.
What is the original encoding in your char* ? UTF8? ANSI? What is the sizeof(wchar) on your system and what encoding does it rely upon? UCS-2 (16bit)? UCS-4 (32bit)?
@Mehrdad: It is not necessarily 2. It is implementation-defined. If programming on Windows, it has a size of two bytes and holds UTF-16, with double wchar_t’s for surrogate pairs.
5 Answers 5
Try swprintf with the %hs flag.
wchar_t ws[100]; swprintf(ws, 100, L"%hs", "ansi string");
@NickDandoulakis I think this answer could be very useful, however I found out that swprintf could have 2 possible interfaces, could you please take a look at this question? stackoverflow.com/q/17716763/2436175
@Benoit: Yeah, there’s obviously more to string conversion than calling just a single function. But I didn’t give any details since I think this is all the OP’s looking for.
@Benoit: There’s no such thing as an «ANSI string». This will work if the original string is in the multibyte format corresponding to the currently set locale.
I already have found this function, but i can’t use it correctly, i just want to encode a string to unicode to send in a mail subject header. Thanks to you
what you’re looking for is
works just like the copy function from char* to char*
but in this case you’re saving into a wchar_t*
If you happen to have the Windows API availiable, the conversion function MultiByteToWideChar offers some configurable string conversion from different encodings to UTF-16. That might be more appropriate if you don’t care too much about portability and don’t want to figure out exactly what the implications of different locale settings are to the string converison.
if you currently have ANSI chars. just insert an 0 (‘\0’) before each char and cast them to wchar_t*.
Linked
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.14.43533
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
Конвертация из char в wchar_t на с++ под Linux
Есть такая функция в обвертке к библиотеке на Windows. Переписал на Linux, однако в Ubunte при создании обвертки к библиотеке понял, что не работают конвертеры WideCharToMultiByte и MultiByteToWideChar.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
X509CertificateGetInfo(char inCert, INT propId, char* outData) { unsigned long rv = 0; int tmpOutDataLength = 2048; std::vectorunsigned char> tmpOutData(tmpOutDataLength); int outDataWLen = 0; int tmpCertLen = WideCharToMultiByte(CP_UTF8, 0, (LPCWCH)inCert, wcslen(inCert), NULL, 0, NULL, NULL) + 1; std::vectorchar> tmpCert(tmpCertLen); wchar_t *outDataW = NULL; CComBSTR bsResultStr; WideCharToMultiByte(CP_UTF8, 0, (LPCWCH)inCert, tmpCertLen, (LPSTR)&tmpCert[0], tmpCertLen, NULL, NULL); rv = kc_funcs->K_X509CertificateGetInfo(&tmpCert[0], tmpCertLen, propId, &tmpOutData[0], &tmpOutDataLength); if (!rv) { outDataWLen = MultiByteToWideChar(CP_UTF8, 0, (LPCCH)&tmpOutData[0], tmpOutDataLength, NULL, 0); std::vectorwchar_t> outDataW(outDataWLen); MultiByteToWideChar(CP_UTF8, 0, (LPCCH)&tmpOutData[0], tmpOutDataLength, (LPWSTR)&outDataW[0], outDataWLen + 1); bsResultStr.Append(&outDataW[0]); *outData = bsResultStr.Detach(); } if (rv) { return S_FALSE; } return S_OK; }
Подскажите, как правильно записать mbstowcs и wcstombs вместо WideCharToMultiByte и MultiByteToWideChar.
И еще вопрос:
Зачем нужны эти
unsigned char tmpOutData[tmpOutDataLength] вместо std::vectorunsigned char> tmpOutData(tmpOutDataLength); char tmpCert[tmpCertLen] вместо std::vectorchar> tmpCert(tmpCertLen); const wchar_t outDataW[outDataWLen] вместо std::vectorwchar_t> outDataW(outDataWLen)
Перевод строк char в wchar_t
1 ответ 1
Вариантов, как всегда, несколько.
- Самый разумный — использование std::use_facet из стандартной библиотеки, метод в таком случае может выглядеть следующим образом (этот вариант — это мое расширение boost::lexical_cast ):
namespace boost < template <>inline std::wstring lexical_cast(const std::string& arg) < std::wstring result; std::locale locale; // Use specific (character-driven) facet for current result. for (std::size_t i= 0; i < arg.size(); ++i) < result += std::use_facet>(locale).widen(arg[i]); > return result; > template <> inline std::string lexical_cast(const std::wstring& arg) < std::string result; std::locale locale; // Use specific (character-driven) facet for current result. for (std::size_t i= 0; i < arg.size(); ++i) < result += std::use_facet>(locale).narrow(arg[i]); > return result; >
struct String < wchar_t* string; size_t length; >; void SetStrIntoWcharObj(String *obj, const char *str) < obj->length = ::mbstowcs(0, src, 0); obj->string = ::realloc(obj->string, (obj->length + 1) * sizeof(wchar_t)); std::size_t length = ::mbstowcs(obj->string, str, obj->length + 1); // Случай, когда конвертация не удалась (::mbstowcs вернет -1) if (length == std::(size_t) - 1) FAIL; // Как обрабатывать FAIL - решать уже вам. >
Решать, разумеется, вам, но (на мой взгляд) преимущества использования std::use_facet и widen очевидны.