CString and WString

BlitzMax Forums/BlitzMax Programming/CString and WString

BLaBZ

(Posted 2013) [#1]

I know by definition these methods return either an 8 bit null terminated byte or 16 bit null terminated shorts but I'm having a hard time understanding what exactly that means and how I should think of these methods.

In what cases is this important?

Is there a reference for working with more low level API calls!?

I recently had to put the contents of a file into a string, decrypt it, then reference the decrypted string as if it was a file. I used cstring and banks to achieve this but I still don't feel like I have a good understanding of what ToCString really does.

Thanks!

TomToad

(Posted 2013) [#2]

In C, strings do not exist natively. To create one, you must use an array of type Char for 8 bit strings and an array of type Int for 16 bit strings. Since the size of strings can vary, you would indicate the end of a string by appending a '0' after the string.

For example, "Hi" would be encoded as

Char MyString[3];
MyString[0] = 72; //ASCII for 'H'
MyString[1] = 105; //ASCII for 'i'
MyString[2] = 0; //0 byte indicates end of string

Of course, you wouldn't build a string byte for byte like that. C includes many functions for handling strings automatically.

Char MyString[] = "Hi"; //Creates the MyString[3] and fills it with 72,105,0 for you

Remember that you will need one extra element in the array for the 0 element in C strings.

Why you would need them would mostly be because you are about to pass a string to a .dll or library that expects strings in C format, or write to a file in which the reading program would expect the data in C format. I really don't think that C strings and C Wstrings have much any other use in BlitzMax.

Brucey

(Posted 2013) [#3]

Generally you only need to convert Strings if something else needs them in a particular format (C string, UTF8 string, or wide string).
BlitzMax's Strings are stored internally as sets of 16-bit (short) characters. If you are converting to C strings, you need to be aware that character values over 255 will be truncated. So, standard ASCII is fine, but for international characters you may have issues. Most of the time it's probably not something you need to worry about though.

On Linux and OS X, filenames are in UTF-8 format, which is a special sequence of byte-size characters. BlitzMax provides conversion functions for UTF-8.
There are also formats such as wchar_t*, std::string, std::wstring, wxString, and a whole host of others used in the likes of C++ and various libraries, which you may come across on your travels in the realms of low-level land.

As Tom says, if you are only doing BlitzMax stuff, you'll rarely, if ever, need to know about anything other than the native String functionality.

Yasha

(Posted 2013) [#4]

The most sneaky difference is probably this, that TomToad touched on:

Since the size of strings can vary, you would indicate the end of a string by appending a '0' after the string.

i.e. C strings cannot contain the null byte. Whereas strings in Blitz and most other languages have no problem with this and can contain arbitrary data, and often do, when people (mis)use them as generic data banks. If you have null characters in your string, it will run into problems when it tries to go through C. It's therefore not a good idea to use C strings for general purpose binary data.

C's string operations are also pretty inefficient for related reasons (e.g. having to loop over every character to find out how long the string is). This also makes them hilariously unsafe when dealing with unverified data. Do avoid whenever possible.