byte ptr --> byte Array and 16bit Strings

BlitzMax Forums/BlitzMax Programming/byte ptr --> byte Array and 16bit Strings

Tibit(Posted 2009) [#1]
Any ideas on how to convert a byte pointer into a byte-Array?

I use String.FromBytes, to do the conversion. I figured it would be nicer to have it in a byte[] instead as a Byte Ptr when passing it around my code (basically the same anyway), so I did this:
Function String8FromBytes:String( Bytes:Byte[] )
	Return String.FromBytes:String( Bytes,Bytes.Length)
EndFunction

Function String8ToBytes:Byte[]( StringData:String )
	Local ByteArray:Byte[] = New Byte[StringData.Length]
	Local BytePointer:Byte Ptr = StringData.ToCString()
	
	For Local n:Int = 0 To ByteArray.Length-1
		ByteArray[n] = BytePointer[n]
	Next
	
	MemFree( BytePointer )
	Return  ByteArray
EndFunction


It seems to work, but the loop-part just don't feel right. I mean an ByteArray is sooo close to a Byte Ptr it feels silly to do that.

On the other hand Blitzmax supports 16bit strings. I have never had any specific use for this. I assume 16bit strings are very useful if you want to do localization. My experience says that localization is not as simple as it sounds, yet I'd like to be able to support an extended char-set, like russian, french, arabic, chinease, kroean, japanese and so on. However I have no good idea how I can test that. Can someone who use a non-english charset use the print/input/GUI elements without problems?

Here is how I did it for 16bit strings (almost the same). Any ideas for improvements? Does it look right? Anyone with an extended char-set that can test it?
Function String16FromBytes:String( Bytes:Byte[] )
	Local ShortArray:Short[] = New Short[Bytes.Length/2]
	
	For Local n:Int = 0 To ShortArray.Length-1
		ShortArray[n] = Bytes[n*2]
	Next

	Local Output:String
	For Local s:Short = EachIn ShortArray
		Output:+ Chr(s)
	Next
	Return Output
	'Local BytePointer:Byte Ptr = Varptr Bytes
	'Return String.FromShorts:String( Short Ptr(BytePointer), Bytes.length/2 )
EndFunction
Function String16ToBytes:Byte[]( StringData:String )
	Local ByteArray:Byte[] = New Byte[StringData.Length*2]
	Local ShortPointer:Short Ptr = StringData.ToWString() ' Short Pointer Converted
	
	For Local n:Int = 0 To StringData.Length-1
		ByteArray[n*2] = ShortPointer[n]
	Next
	MemFree( ShortPointer)
	Return  ByteArray
EndFunction


Also when I do this:
Local B:Byte[] = String16ToBytes("12345678")
Print "size: "+B.Length
Local S:String = String8FromBytes( B )
Print S+", StringLength: "+S.Length

I get the result "12345678", but it would seem to me that unpacking a 16bit string into bytes and then taking these bytes and packing them into a 8-bit string would corrupt the string :)


Brucey(Posted 2009) [#2]
Do you really need it not to be a String in your code?

You can always do a memcopy into your byte array.

But unless you are planning on passing this data into a 3rd-party library, I can't see how useful it would be to be converting out of the string, and passing that around instead?


Tibit(Posted 2009) [#3]
Some of the data are originally strings, like chat-text, names, labels etc. For non-string data it could just as well be a bytes, but the functions I have atm handle strings, however I guess I could use memcopy here also and convert those to bytes? But I'm unsure, is it grossly inefficient to do these conversions?

Function String8ToBytes:Byte[]( StringData:String )
	Local ByteArray:Byte[] = New Byte[StringData.Length]
	Local BytePointer:Byte Ptr = StringData.ToCString()
	MemCopy( ByteArray, BytePointer, StringData.Length )	
	MemFree( BytePointer )
	Return  ByteArray
EndFunction



_Skully(Posted 2009) [#4]
More than likely... I would just handle them as strings within BMax as its quite fast... it looks like you are converting then converting...

I guess I might not be seeing the purpose behind needing to make them byte arrays.


Tibit(Posted 2009) [#5]
The reason for having byte-arrays instead of byte pointers is simply that it feels more safe. If someone would change the length or send the wrong length as argument to I'll have pointer-problems, which is not very fun to have. Also byteArrays are clean and neat, and comes with a bunch of handy functionality (so does strings).

Is no one using 16bit-strings?


Brucey(Posted 2009) [#6]
Is no one using 16bit-strings?

Sure, but we tend to keep the strings as strings until we need to do something else with them... like, for example, converting to UTF8 for a third-party library.

Or do you mean, "Is no one using 16-bit strings to store binary data?"


Tibit(Posted 2009) [#7]
I'm doing networking stuff so I was curious if converting all strings (like chat text and so on) to 8bit strings would mess things up for people who use 16bit strings that can't be directly converted into 8bit ones. Because it is sent over the network I care both for localization compatibility, packet-size and pack/unpack performance.

I mean all strings in max are 16bit, so everyone uses them in that sense, but if you stick with English I think the conversion to 8bit strings won't do any difference i.e. people are not using more than 8bits/char. But I figured it might, for some people, like someone from Russia or Asia maybe?


Brucey(Posted 2009) [#8]
You should assume everyone uses Strings that can have characters > ASCII 255.

If you want to convert strings to 8-bit, just convert them to UTF8, using the following string methods/functions :

ToUTF8String()

FromUTF8String()

That covers the first 65535 available unicode characters. BlitzMax alas, doesn't support any higher than that... ho hum.


Tibit(Posted 2009) [#9]
AH, nice didn't know about those commands.

How do I use them? This doesn't seem to work:
Local Pointer:Byte Ptr = "ABCDEfghi$]£€[£]".ToUTF8String()
Print String.FromUTF8String(Pointer)

How does these methods differ from ToCString / FromCString?

Thanks for the help so far btw :)


Brucey(Posted 2009) [#10]
Well, it should work... but there's a bug in 1.33 which I'd temporarily forgotten about :-)

How does these methods differ from ToCString / FromCString?

It converts the UTF-16 (short) based strings to UTF-8 (byte) strings, where UTF-8 uses a special algorithm to describe in 8 bits, characters that can use more than 8 bits.

The files-systems on Mac and Linux, as well as GTK on Linux, are all UTF-8 based.