Bitsize Unicode

BlitzMax Forums/BlitzMax Programming/Bitsize Unicode

Casaber(Posted 2016) [#1]
I read that Bmax uses 16 bit Unicode for strings.

Is there a way to use 8 bit Unicode inside Bmax?


Henri(Posted 2016) [#2]
Hi,

probably more information is needed to answer your question properly. Blitzmax string characters are stored internally in a short array, which can be converted to memory block of 8-bit chars (String.toCString() ) or memory block of 16-bit chars (String.ToWString() ).

-Henri


Casaber(Posted 2016) [#3]
Thanks, then no conversion will be needed I notice. That's great news.


dw817(Posted 2016) [#4]
Henri, this is also something I was interested in. Could you post an example please of converting a regular string to 8-bit memory unicode without a FOR/NEXT loop ?


Henri(Posted 2016) [#5]
Sure. Just to be clear, strictly 8 bits is not considered Unicode.

Strict

Local s:String = "Coding is fun!"

'Convert string to null terminated sequence of 8-bit bytes. Reserved memory must be freed with Memfree()
Local bp:Byte Ptr = s.ToCString()

'Access characters
For Local i:Int = 0 Until s.length	'Or alternatively check for 0 byte
	Print Chr(bp[i])
Next

MemFree(bp)


EDIT: Or more precisely, Unicode is a standard map of characters known as codepoints. There are over million codepoints in the Unicode map. First 256 codepoints can be stored inside 8 bit container, which represents most western languages.

These codepoints can be stored in a number of different ways which most common are:

UTF-8 = dynamic size for each codepoint depending on codepoint value (from 1 byte to 4 bytes).

UTF-16/UCS-2 = constant 16-bit value for each codepoint. This is also known as Wide String or WString (Used by Blitzmax).

UTF-16 = dynamic 16-bit or 32-bit size for each codepoint depending on codepoint value.

-Henri


dw817(Posted 2016) [#6]
You mentioned that reserved memory must be freed, Henri. Is this required if the program ends normally or abnormally during final non-debug runtime ?


Henri(Posted 2016) [#7]
Yes.

ToCString() method reserves memory with MemAlloc, fills that memory with character data and returns a pointer to that memory. This is low level stuff and is not handled by the garbage collector so the reserved memory must be freed manually. Failure to do so results a memory leak on program end.

-Henri