The MultiByteToWideChar function maps a character string to a wide-character (Unicode) string. The character string mapped by this function is not necessarily from a multibyte character set.
int MultiByteToWideChar(
UINT CodePage, // code page
DWORD dwFlags, // character-type options
LPCSTR lpMultiByteStr, // string to map
int cbMultiByte, // number of bytes in string
LPWSTR lpWideCharStr, // wide-character buffer
int cchWideChar // size of buffer
);Parameters
CodePage
[in] Specifies the code page to be used to perform the conversion. This parameter can be given the value of any code page that is installed or available in the system. You can also specify one of the values shown in the following table. Value Meaning
CP_ACP ANSI code page
CP_MACCP Macintosh code page
CP_OEMCP OEM code page
CP_SYMBOL Windows 2000/XP: Symbol code page (42)
CP_THREAD_ACP Windows 2000/XP: The current thread's ANSI code page
CP_UTF7 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-7
CP_UTF8 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-8. When this is set, dwFlags must be zero.
Windows 95: Under the Microsoft Layer for Unicode, MultiByteToWideChar also supports CP_UTF7 and CP_UTF8.
dwFlags
[in] Indicates whether to translate to precomposed or composite-wide characters (if a composite form exists), whether to use glyph characters in place of control characters, and how to deal with invalid characters. You can specify a combination of the following flag constants. Value Meaning
MB_PRECOMPOSED Always use precomposed characters—that is, characters in which a base character and a nonspacing character have a single character value. This is the default translation option. Cannot be used with MB_COMPOSITE.
MB_COMPOSITE Always use composite characters—that is, characters in which a base character and a nonspacing character have different character values. Cannot be used with MB_PRECOMPOSED.
MB_ERR_INVALID_CHARS If the function encounters an invalid input character, it fails and GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
Note that this flag is not supported for DLL-based encodings.
MB_USEGLYPHCHARS Use glyph characters instead of control characters.
A composite character consists of a base character and a nonspacing character, each having different character values. A precomposed character has a single character value for a base/nonspacing character combination. In the character č, the e is the base character and the accent grave mark is the nonspacing character.
The function's default behavior is to translate to the precomposed form. If a precomposed form does not exist, the function attempts to translate to a composite form.
The flags MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. The MB_USEGLYPHCHARS flag and the MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.
For the code pages in the following table, dwFlags must be zero, otherwise the function fails with ERROR_INVALID_FLAGS. 50220
50221
50222
50225
50227
50229
52936
54936
57002 through 57011
65000 (UTF7)
65001 (UTF8)
lpMultiByteStr
[in] Points to the character string to be converted.
cbMultiByte
[in] Specifies the size in bytes of the string pointed to by the lpMultiByteStr parameter, or it can be -1 if the string is null terminated.
If this parameter is -1, the function processes the entire input string including the null terminator. The resulting wide character string therefore has a null terminator, and the returned length includes the null terminator.
If this parameter is a positive integer, the function processes exactly the specified number of bytes. If the given length does not include a null terminator then the resulting wide character string will not be null terminated, and the returned length does not include a null terminator.
lpWideCharStr
[out] Points to a buffer that receives the translated string.
cchWideChar
[in] Specifies the size, in wide characters, of the buffer pointed to by the lpWideCharStr parameter. If this value is zero, the function returns the required buffer size, in wide characters, and makes no use of the lpWideCharStr buffer.
Return Values
If the function succeeds, and cchWideChar is nonzero, the return value is the number of wide characters written to the buffer pointed to by lpWideCharStr.
If the function succeeds, and cchWideChar is zero, the return value is the required size, in wide characters, for a buffer that can receive the translated string.
If the function fails, the return value is zero. To get extended error information, call GetLastError. GetLastError may return one of the following error codes:
Remarks
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.
The function fails if MB_ERR_INVALID_CHARS is set and encounters an invalid character in the source string. An invalid character is either, a) a character that is not the default character in the source string but translates to the default character when MB_ERR_INVALID_CHARS is not set, or b) for DBCS strings, a character which has a lead byte but no valid trailing byte. When an invalid character is found, and MB_ERR_INVALID_CHARS is set, the function returns 0 and sets GetLastError with the error ERROR_NO_UNICODE_TRANSLATION.
Windows XP: To prevent the security problem of the non-shortest-form versions of characters, MultiByteToWideChar deletes these characters.
Windows 95/98/Me: MultiByteToWideChar is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
Example Code
For an example, see Looking Up a User's Full Name.
Requirements
Windows NT/2000/XP: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
See Also
Unicode and Character Sets Overview, Unicode and Character Set Functions, WideCharToMultiByte
FUNCTION Utf8ToAnsi (Source : STRING; UnknownChar : CHAR = '?) : ANSISTRING;
(* Converts the given UTF-8 String to Windows ANSI (Win-1252).
If a character can not be converted, the "UnknownChar" is inserted. *)
VAR
SourceLen : INTEGER; // Length of Source string
I, K : INTEGER;
A : BYTE; // Current ANSI character value
U : WORD;
Ch : CHAR; // Dest char
Len : INTEGER; // Current real length of "Result" string
BEGIN
SourceLen := Length (Source);
SetLength (Result, SourceLen); // Enough room to live
Len := 0;
I := 1;
WHILE I <= SourceLen DO BEGIN
A := ORD (Source [I]);
IF A < $80 THEN BEGIN // Range $0000..$007F
INC (Len);
Result [Len] := Source [I];
INC (I);
END
ELSE BEGIN // Determine U, Inc I
IF (A AND $E0 = $C0) AND (I < SourceLen) THEN BEGIN // Range $0080..$07FF
U := (WORD (A AND $1F) SHL 6) OR (ORD (Source [I+1]) AND $3F);
INC (I, 2);
END
ELSE IF (A AND $F0 = $E0) AND (I < SourceLen-1) THEN BEGIN // Range $0800..$FFFF
U := (WORD (A AND $0F) SHL 12) OR
(WORD (ORD (Source [I+1]) AND $3F) SHL 6) OR
( ORD (Source [I+2]) AND $3F);
INC (I, 3);
END
ELSE BEGIN // Unknown/unsupported
INC (I);
FOR K := 7 DOWNTO 0 DO
IF A AND (1 SHL K) = 0 THEN BEGIN
INC (I, (A SHR (K+1))-1);
BREAK;
END;
U := WIN1252_UNICODE [ORD (UnknownChar)];
END;
Ch := UnknownChar; // Retrieve ANSI char
FOR A := $00 TO $FF DO
IF WIN1252_UNICODE [A] = U THEN BEGIN
Ch := CHR (A);
BREAK;
END;
INC (Len);
Result [Len] := Ch;
END;
END;
SetLength (Result, Len);
END;
MultiByteToWideChar
The MultiByteToWideChar function maps a character string to a wide-character (Unicode) string. The character string mapped by this function is not necessarily from a multibyte character set.
int MultiByteToWideChar(
UINT CodePage, // code page
DWORD dwFlags, // character-type options
LPCSTR lpMultiByteStr, // string to map
int cbMultiByte, // number of bytes in string
LPWSTR lpWideCharStr, // wide-character buffer
int cchWideChar // size of buffer
);Parameters
CodePage
[in] Specifies the code page to be used to perform the conversion. This parameter can be given the value of any code page that is installed or available in the system. You can also specify one of the values shown in the following table. Value Meaning
CP_ACP ANSI code page
CP_MACCP Macintosh code page
CP_OEMCP OEM code page
CP_SYMBOL Windows 2000/XP: Symbol code page (42)
CP_THREAD_ACP Windows 2000/XP: The current thread's ANSI code page
CP_UTF7 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-7
CP_UTF8 Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-8. When this is set, dwFlags must be zero.
Windows 95: Under the Microsoft Layer for Unicode, MultiByteToWideChar also supports CP_UTF7 and CP_UTF8.
dwFlags
[in] Indicates whether to translate to precomposed or composite-wide characters (if a composite form exists), whether to use glyph characters in place of control characters, and how to deal with invalid characters. You can specify a combination of the following flag constants. Value Meaning
MB_PRECOMPOSED Always use precomposed characters—that is, characters in which a base character and a nonspacing character have a single character value. This is the default translation option. Cannot be used with MB_COMPOSITE.
MB_COMPOSITE Always use composite characters—that is, characters in which a base character and a nonspacing character have different character values. Cannot be used with MB_PRECOMPOSED.
MB_ERR_INVALID_CHARS If the function encounters an invalid input character, it fails and GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
Note that this flag is not supported for DLL-based encodings.
MB_USEGLYPHCHARS Use glyph characters instead of control characters.
A composite character consists of a base character and a nonspacing character, each having different character values. A precomposed character has a single character value for a base/nonspacing character combination. In the character č, the e is the base character and the accent grave mark is the nonspacing character.
The function's default behavior is to translate to the precomposed form. If a precomposed form does not exist, the function attempts to translate to a composite form.
The flags MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. The MB_USEGLYPHCHARS flag and the MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.
For the code pages in the following table, dwFlags must be zero, otherwise the function fails with ERROR_INVALID_FLAGS. 50220
50221
50222
50225
50227
50229
52936
54936
57002 through 57011
65000 (UTF7)
65001 (UTF8)
lpMultiByteStr
[in] Points to the character string to be converted.
cbMultiByte
[in] Specifies the size in bytes of the string pointed to by the lpMultiByteStr parameter, or it can be -1 if the string is null terminated.
If this parameter is -1, the function processes the entire input string including the null terminator. The resulting wide character string therefore has a null terminator, and the returned length includes the null terminator.
If this parameter is a positive integer, the function processes exactly the specified number of bytes. If the given length does not include a null terminator then the resulting wide character string will not be null terminated, and the returned length does not include a null terminator.
lpWideCharStr
[out] Points to a buffer that receives the translated string.
cchWideChar
[in] Specifies the size, in wide characters, of the buffer pointed to by the lpWideCharStr parameter. If this value is zero, the function returns the required buffer size, in wide characters, and makes no use of the lpWideCharStr buffer.
Return Values
If the function succeeds, and cchWideChar is nonzero, the return value is the number of wide characters written to the buffer pointed to by lpWideCharStr.
If the function succeeds, and cchWideChar is zero, the return value is the required size, in wide characters, for a buffer that can receive the translated string.
If the function fails, the return value is zero. To get extended error information, call GetLastError. GetLastError may return one of the following error codes:
ERROR_INSUFFICIENT_BUFFER
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER
ERROR_NO_UNICODE_TRANSLATION
Remarks
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.
The function fails if MB_ERR_INVALID_CHARS is set and encounters an invalid character in the source string. An invalid character is either, a) a character that is not the default character in the source string but translates to the default character when MB_ERR_INVALID_CHARS is not set, or b) for DBCS strings, a character which has a lead byte but no valid trailing byte. When an invalid character is found, and MB_ERR_INVALID_CHARS is set, the function returns 0 and sets GetLastError with the error ERROR_NO_UNICODE_TRANSLATION.
Windows XP: To prevent the security problem of the non-shortest-form versions of characters, MultiByteToWideChar deletes these characters.
Windows 95/98/Me: MultiByteToWideChar is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
Example Code
For an example, see Looking Up a User's Full Name.
Requirements
Windows NT/2000/XP: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
See Also
Unicode and Character Sets Overview, Unicode and Character Set Functions, WideCharToMultiByte