Reading values from a string

BlitzMax Forums/BlitzMax Programming/Reading values from a string

Hezkore(Posted 2016) [#1]
I have this string with a lot of characters and symbols that I need to read bytes, shorts, ints and floats from.
And I don't mean Int("10") I mean like how you do with streams.
So an example would be like...
Local myString:String = "jq&28!Nwo=$]>"
Local pos:int

Print ReadStringByte(myString, pos)
pos:+1
Print ReadStringFloat(myString, pos)
And so on...
I need to read byte, short, int and float.
I also need to somehow write a string in a similar way.
So something like WriteStringByte(myString, value)

Byte and Short is relatively easy since I can just use Chr(value) to write and read it with Asc().
But Integers and Float are more tricky.


Bobysait(Posted 2016) [#2]
Mid(s, index, charcount).ToInt()
Mid(s, index, charcount).ToFloat()

or using the string array (with index from 0 to "string.length-1")
s[index..index+charcount].ToInt()
s[index..index+charcount].ToFloat()

Automatic cast to short is implicit if you use Local v:Short="123456".ToInt()

Now, the "That's not a good idea" thing :
-> a value as string can be encoded using an arbitrary number of chars, si you 'd to know exactly the length of the litteral number you're looking for.
For example, "a6da123.123456"
If you're looking for an Int where is the start, where is the end ?


FireballStarfish(Posted 2016) [#3]
Assuming I've understood you correctly, you can get a pointer to the raw data of a BlitzMax String's characters like this:
Local str:String = "a"
Local p:Byte Ptr = Byte Ptr(Object str) + 4

You can then use p to read or modify the contents of the string. This seems like a questionable technique though, why do you want to do this?


Brucey(Posted 2016) [#4]
Does it need to be a String? Why not just a block of memory (like a TBank or MemAlloc()) ?

Strings are meant to be immutable, so modifying their content is not a good idea.


Hezkore(Posted 2016) [#5]
@Brucey
The string is already set, I'm only reading it.
It first contains a byte, then a short, int and lastly a float.
But yeah I think it has to be a string...
It's from another application which is made in Construct 2 and it doesn't have much options for such things.
I also have to return a string back to Construct 2, and it only reads strings.

@Bobysait
Isn't that kinda exactly like doing Int("10")?
Like "10".ToInt()
Cause "a".ToInt() doesn't return anything.

@FireballStarfish
I'm not quite sure how to use that.
How do I write a byte or integer to the string with this?

Here's an example that works with Bytes and Shorts.
But that's only because Chr and Asc takes a short.
I'm not exactly sure how to combine two letters to get an integer or float.
SuperStrict

Local myString:String

WriteStringValue(myString, 32767)
WriteStringValue(myString, 65535)
Print "String= " + myString
Print "Value 1 from string= " + ReadStringValue(myString, 0)
Print "Value 2 from string= " + ReadStringValue(myString, 1)

Function WriteStringValue(str:String Var, value:Int)
	str:+Chr(value)
EndFunction
Function ReadStringValue:Int(str:String Var, pos:Int)
	Return Asc(Mid(str, pos + 1, 1))
EndFunction



Brucey(Posted 2016) [#6]
Well, presumably you know that Blitzmax string characters are two bytes, so the internal representation for A in bytes is something like 65, 0
The string AB might be 65,0,66,0

So unless that's what you are expecting, you may prefer to use a block of memory instead.


Floyd(Posted 2016) [#7]
This shouldn't be too difficult, provided you know exactly how the strings are formatted and how the numbers are encoded. Is that documented anywhere?

Does every string have the same format? If not then how do you know the format?

There are details to consider, such as whether shorts are signed. Both signed and unsigned are reasonable. BlitzMax shorts are signed. No idea about Construct 2.

It would help if we could see an example string and the numbers you expect to read from it.


Hezkore(Posted 2016) [#8]
I'm afraid I can't provide a sample string at the moment.
But as long as I can write the strings myself, then that's kinda the format we'll go by.
But byte and short are unsigned, while int is signed.
I also did this some time ago: http://www.blitzmax.com/codearcs/codearcs.php?code=3033
Could we possibly use that to shorten down the string?
Cause that's kinda the goal of the entire thing, to get the shortest possible string while containing as much data as possible.
The order the bytes and ints come in isn't that important right now, but I will know that later on, so a long as I can read from different positions it's fine.


Floyd(Posted 2016) [#9]
You could certainly design a format which works within BlitzMax, but will that be usable in Construct 2?

For example, a single precision float uses four bytes. That would fit in two 2-byte characters in BlitzMax. You can get at the 32 bits of data contained in them using BlitzMax. But is the same thing possible in Construct 2? It would not be when using typical string ( text ) operations. But BlitzMax lets you access the memory holding the string data.


Hezkore(Posted 2016) [#10]
I don't think we should worry too much about Construct 2.
But the string has to be normal "text", something I could store in a file or post to a HTTP server, print in the console etc. with all the data intact.
Basically what I'm doing is using Constract 2's WebSocket plugin to send data to a WebSocket server written in BlitzMax, sadly the WebSocket plugin only supports sending strings.
And I'd like the data I'm sending back and forth to be as optimized as possible, even if sending an entire string is always a bad idea.
So I'm really just trying to squeeze as much info into that string and keeping it as short as possible.

And yeah, I think storing multiple values (for example 2 bytes in a short) is a good idea.
The link I posted above is my old code for doing that, but I can't quite figure out how to use it with a string.
I'll have to figure out how to do that in Construct 2, but that's a question for the Construct 2 forums haha
I might end up writing the "client" in Monkey, and it seems like Monkey's WebSocket module also only sends strings.
But at least with Monkey It'll be easier to read the strings.


Hezkore(Posted 2016) [#11]
I've started on what I call a "Data String" and this is what I've managed to get so far.
Notice that Float doesn't work and that's what I'd like help with.
Also optimization if anyone can figure that out heh.
SuperStrict

Local myDataString:TDataString = TDataString.Create()
'Print "Writing values"
myDataString.WriteByte(255)
myDataString.WriteShort(65535)
myDataString.WriteString("Hello!")
myDataString.WriteInt(2147483647)
'myDataString.WriteFloat(1.5) 'Incomplete!

Print "~nDemonstrating string"
Print myDataString.ToString()

Print "~nReading back values"
myDataString.Seek(0)
Print myDataString.ReadByte()
Print myDataString.ReadShort()
Print myDataString.ReadString()
Print myDataString.ReadInt()
'Print myDataString.ReadFloat() 'Incomplete!

Type TDataString
	Field Pos:Int
	Field Str:String
	
	'Create new string or use existing one
	Function Create:TDataString(str:String = "")
		Local nDS:TDataString = New TDataString
		nDS.str = str
		nDS.Pos = 0
		Return nDS
	EndFunction
	
	'Byte
	Method WriteByte(value:Byte)
		str:+Chr(32768 | value)
		Pos:+1
	EndMethod
	Method ReadByte:Byte()
		Pos:+1
		Return Str[Pos - 1]
	EndMethod
	
	'Short
	Method WriteShort(value:Short)
		WriteByte(value Shr 8)
		WriteByte(value)
	EndMethod
	Method ReadShort:Short()
		Return ReadByte() Shl 8 | ReadByte()
	EndMethod
	
	'Int
	Method WriteInt(value:Int)
		WriteByte(value Shr 24)
		WriteByte(value Shr 16)
		WriteByte(value Shr 8)
		WriteByte(value)
	EndMethod
	Method ReadInt:Int()
		Return ReadByte() Shl 24 | ReadByte() Shl 16 | ReadByte() Shl 8 | ReadByte()
	EndMethod
	
	'Float
	Method WriteFloat(value:Short)
	EndMethod
	Method ReadFloat:Short()
	EndMethod
	
	'String
	Method WriteString(Text:String)
		str:+Chr(Text.Length - 1)
		Pos:+1
		For Local i:Int = 0 To Text.Length
			str:+Mid(Text, i, 1)
			Pos:+1
		Next
	EndMethod
	Method ReadString:String()
		Pos:+1
		Local rS:String = Mid(str, pos + 1, Asc(Mid(str, Pos, 1)) + 1)
		Pos:+rS.Length
		Return rS
	EndMethod
	
	'Position
	Method Seek(pos:Int)
		Self.pos = pos
	EndMethod
	
	'Optimization (not used)
	Function storeByte_int(pntr:Int Var, value:Byte, pos:Byte)		'Holds 4 Bytes
		pntr:+(value Shl ((pos Mod SizeOf(pntr)) Shl 3))
	EndFunction
	Function getByte_int:Byte(pntr:Int Var, pos:Byte)				'Holds 4 Bytes
		Return(pntr Shr ((pos Mod SizeOf(pntr)) Shl 3))
	EndFunction
	
	Function storeByte_short(pntr:Short Var, value:Byte, pos:Byte)	'Holds 2 Bytes
		pntr:+(value Shl ((pos Mod SizeOf(pntr)) Shl 3))
	EndFunction
	Function getByte_short:Byte(pntr:Short Var, pos:Byte)			'Holds 2 Bytes
		Return(pntr Shr ((pos Mod SizeOf(pntr)) Shl 3))
	EndFunction
	
	Function storeShort_int(pntr:Int Var, value:Short, pos:Byte)	'Holds 2 Shorts
		pntr:+(value Shl ((pos Mod SizeOf(pntr)) Shl 4))
	EndFunction
	Function getShort_int:Short(pntr:Int Var, pos:Byte)				'Holds 2 Shorts
		Return(pntr Shr ((pos Mod SizeOf(pntr)) Shl 4))
	EndFunction
	
	'Return actual string
	Method ToString:String()
		Return Self.Str
	EndMethod
EndType



grable(Posted 2016) [#12]
Your implementation is very similar to TStream and TBank.
Though they dont do exactly what you want, you could probably coax them into writing your specific encoding.

Otherwise using strings like that is pretty slow though, the only redeeming feature is that it grows without much hassle hehe
But if your only making small strings it shouldnt matter much i guess...

In any case, for fun i modified your source to use raw memory instead. Which also auto grows, much much less frequently.
Note that it clears the buffers it allocates only so that ToString() works in all cases, otherwise those can be removed.

Also, if you dont have any special requirements for floats one could always encode them as integers like below.




Hezkore(Posted 2016) [#13]
Very cool grable!
I'm having some problems writing longer strings though since INITSIZE is only 64.
Isn't there a way to dynamically set that depending on the length of the string?

Also I think we should resize in chunks rather than just +1 every time.
That way we can minimize calls to MemExtend.

And yeah, I wanted it to be similar to a TStream or TBank heh.

There's one thing that really bugs me though and that's the fact that short uses two letters when I know that one letter is a short.
And I'm really just trying to shorten down the length of the string here.
So I'm thinking I might do so that a short uses just the ascii value of the letter, but that means 0 isn't writable (since that's end of string) so I'd have to do so that shorts are 65534 and not 65535 in length.


grable(Posted 2016) [#14]
I'm having some problems writing longer strings though since INITSIZE is only 64.
Isn't there a way to dynamically set that depending on the length of the string?

Also I think we should resize in chunks rather than just +1 every time.
That way we can minimize calls to MemExtend.

It does, INITSIZE is only the initial size, after that it will double the size every time the buffer is full.

And btw, blitz Strings dont use zero terminators so you can have those in your strings too.
Its just my version of your TDataString that relies upon it in certain states.


Hezkore(Posted 2016) [#15]
Well if you do
myDataString.WriteString("Hello! This is a loooooooooong string with an error at the end")
And then read it back you'll get
Hello! This is a loooooooooong string with an error at the en倀
So something's up with long strings.
If I increase the INITSIZE it would support longer strings.
But it feels dirty doing that, so I was wondering if it could dynamically expand depending on how long string I write to it.


grable(Posted 2016) [#16]
so I was wondering if it could dynamically expand depending on how long string I write to it.

As i said, its supposed to do just that. It checks the size on every WriteByte().
And when supplied an init string, it will allocate memory the size of that string, so INITSIZE isnt even used in that case.
Also note the additional parameter to TDataString.Create() for append mode, which is on by default.

Still, i am unable to reproduce that error :(

EDIT: Btw, a bug in Resize() didnt clear that initial block, though it shouldnt trigger for your case above... maybe it will fix something :p


Hezkore(Posted 2016) [#17]
Sadly I'm still getting the error :/
And an even longer string sometimes actually crashes the application but I'm not pointed to the problem, even though I'm in Debug mode.
myDataString.WriteString("Hello! This is a looooooooooooooooong string with an error at the end")



grable(Posted 2016) [#18]
Yeah, my bad. I misread your example ;)
And i forgot that i didnt actually use WriteByte() from WriteString() too *coff*, so thats why hehe

add
Resize(1 + Text.Length)
to the beginning of WriteString and you should be good.


Hezkore(Posted 2016) [#19]
Well done grable!
I've kinda taken a step back though and redid a few things, all in the interest of getting the shortest string possible.
Using your code and writing these values:
myDataString.WriteByte(98)
myDataString.WriteByte(255)
myDataString.WriteShort(65535)
myDataString.WriteString("Hello")
myDataString.WriteInt(2147483647)
myDataString.WriteFloat(65535.5)
The final length of the string is 18 characters.
With this new code - I've managed to get it down to 12 characters by storing two bytes in one character (so Int and Float are now 2 characters instead of 4)
This code doesn't use your fancy memory stuff though, and it might also be a bit tricky to read the string when using seek (due to sub positions) but for me this works very well!



grable(Posted 2016) [#20]
I've managed to get it down to 10 characters by storing two bytes in one character (so int is now 2 characters instead of 4)
Too bad this wont work for strings :/ or at least it will print garbage...
Only reason i used Short Ptr was for compatibility with wide strings, so if your going to write single bytes anyway, might as well use bytes directly.
But at that point you should probably write UTF8 too if you want anything more than ascii, so this wont be that easy any longer :p

This code doesn't use your fancy memory stuff though and I can't figure out floats either heh
Your doing memory stuff now though. And without the growing part, it will write past the end of the string eventually (bad!)

And with floats, couldnt you just encode them as 4 bytes, same as int?


Hezkore(Posted 2016) [#21]
Yeah it's a bit of a shame some characters in strings will mess up, but atleast I can use my Swedish едц letters heh.

I'll work on resizing the array correctly!

I've managed to get floats in, but I couldn't do what you did in Blitz Max in Monkey nor Construct 2.
So I ended up using 3 bytes for the integer part of the float and one byte for the decimal.

This now works in Monkey and Construct 2, so I'm quite happy with what I've got here.
Thanks everyone for the help!

For anyone interested, I've put this up on my BitBucket: https://bitbucket.org/Hezkore/blitzmax-data-string


grable(Posted 2016) [#22]
Yeah, getting at the bits of floats can be tricky without pointers or some other way to reinterpret the bits.

Im not versed in those languages, but if they have function pointers and or var arguments maybe you could trick it somehow.
Other then that, you can get crafty with sprintf/sscanf if they are available ;)


Floyd(Posted 2016) [#23]
Other languages somethimes provide a way to get at the bits without pointers.

Here's what I used to do in Blitz3D.

bank = CreateBank(4)

x# = 25

PokeFloat bank, 0, x     ; put 4-byte Float into bank
n = PeekInt( bank, 0 )   ; read it back as if it were an Int, so we can get at the bits
                         ; reverse the process by Poking an Int and Peeking a Float.

Print
Print " Float value: " + x
Print
Print " The bits are: " + Bin(n)

WaitKey