In the previous tutorial we passed a single string to a native C/C++ function by using P/Invoke.
This function was defined like this:
// C++ void print_line(const char* str);
// C# [DllImport("NativeLib.dll")] private static extern void print_line(string str);
However, there exists a hidden pitfall here:
What happens when the user passes a non-ASCII character to this function?
ASCII and Unicode: A Historical Overview ∞
Historically there was ASCII which defined characters up to character number 127 (i.e. everything that fits into 7 bits). However, these 128 characters contained only letters used in English. Umlauts (like ΓΒ€, ΓΒΆ, ΓΒΌ) and other characters were not present. So, the 8th bit was used to map these characters, but the mapping was not standardized. Basically each country had its own mapping of the region 128 – 255. These different mapping were called code pages.
For example, on code page 850 (MS-DOS Latin 1) the character number 154 is Ü (German Umlaut) while on code page 855 (MS-DOS Cyrillic) the very same character number represents џ (Cyrillic small letter DZHE).
To unify these different mapping the Unicode standard was established in 1991. The idea was (and is) to give each existing character a unique id. These ids are called code points. So basically the Unicode standard is “just” a much bigger version of the ASCII standard. The latest version as of writing is Unicode version 6.1 which covers over 110,000 characters.
Along with the Unicode standard several encodings were developed. Each encoding describes how to convert Unicode code points into bytes. The most famous ones are UTF-8 and UTF-16.
Please note that all encodings can encode all Unicode code points. They just differ in the way they do this.
If you want to experiment a little bit with Unicode, there is a Unicode Explorer I’ve written. Go ahead and give it a try.
P/Invoke String Conversions ∞
Back to the actual problem. With the parameter of print_line()
defined as const char*
(and char
being 8 bit) it’s not clear which code page to use for the strings passed to this function.
Instead, let’s change the parameter type to Unicode (also sometimes referred to as “wide characters”):
void print_line(const wchar_t* str);
No, let’s also adopt the C# mapping:
[DllImport("NativeLib.dll", CharSet = CharSet.Unicode)] private static extern void print_line(string str);
The only difference here it that we specified the CharSet
to be Unicode.
With this, C# will pass strings as UTF-16 encoded strings to the C++ function.
UTF-16 is, as said before, an encoding the converted Unicode code points into bytes and the other way around. In UTF-16 each code point is either encoded with one or with two WORD
s (16 bit values). The most frequently used code points will fit into one WORD
, the less frequently used code points fit into two WORD
s (called a “surrogate pair“).
Important: There is no ISO C way of how to print Unicode characters to the console. wprintf()
won’t work – at least on Windows.
Returning Strings ∞
Returning strings is not as trivial as passing them as parameters.
The following is a quote from Stack Overflow.
The problem though comes with what to do with the native memory that was returned from foo()
. The CLR assumes the following two items about a PInvoke function which directly returns the string type
- The native memory needs to be freed
-
The native memory was allocated with
CoTaskMemAlloc
Therefore it will marshal the string and then call CoTaskMemFree
on the native memory blob. Unless you actually allocated this memory with CoTaskMemAlloc
this will at best cause a crash in your application.
In order to get the correct semantics here you must return an IntPtr
directly. Then use Marshal.PtrToString
in order to get to a managed String value. You may still need to free the native memory but that will dependent upon the implementation of foo.
Thanks for this! I couldn’t figure out why I kept getting an AccessViolationException when trying to return “const *char” as a string. PtrToStringAnsi saved me.
tks for suggesting
Thank you so much …
thanks for the
CharSet = CharSet.Unicode
part!
Hello,
Your tutorial was very helpful to me in resolving some of the issue I was have in porting a thirdparty dll I am using in an hardware integration project. But, I still have a problem with one of the functions that takes a function pointer. The actual problem is not passing the function pointer itself but with one of the parameters/argument of the callback function. As instructed by the library documentation, the callback function definition looks like this:
long OnCodeline(int docid, char* codeline, int res);
the problem I am having is with the second argument {char *}. This argument is used to return the character string generated by the hardware from the library. I do not know how memory was allocated for the string and hence I can’t seem to correctly code the callback function.
Regards
I think you usually use
char**
(notice the second star) and let the function allocate the memory for the string.Hi ,
I am trying to marshall a char [][] value in C#.
more info ..the value i was trying to marshall was
char somecodeString[MAX_FIELDS][MAX_CODE + 1];
Hi Sebastian, I need your advice on how to read the data from the unmanaged function that has one of the parameter as char*** responses. What is the equivalent type in C# to be declared and read the data coming from that unmanaged function. thanks in advance.
best regards,
Siddharth
Hi Siddharth, I don’t think you can map this to C#. You can only mapchar* and char** .