WriteString bug???

BlitzPlus Forums/BlitzPlus Programming/WriteString bug???

cbmeeks(Posted 2003) [#1]
I am writing to a file.

Here is the code:

fileout = WriteFile(file$)				
WriteString fileout,"Hello"
WriteInt fileout,TILE_WIDTH
WriteInt fileout,TILE_HEIGHT
CloseFile(fileout)


Everything seems to work but when I open the file into a hex editor, I notice it puts 4 bytes in front of "Hello".
Why is that?

Thanks

cb


skidracer(Posted 2003) [#2]
See the docs for explanation, there should be a SeeAlso WriteLine which is recomended method for writing ascii text.


julianbury(Posted 2003) [#3]
fileout = WriteFile(file$)
WriteString(fileout,"Hello")
WriteInt(fileout,TILE_WIDTH)
WriteInt(fileout,TILE_HEIGHT_)
CloseFile(fileout)


cbmeeks(Posted 2003) [#4]
julianbury, I tried adding the "(" and ")" but it still did the same thing.

I switched to WriteLine and it worked. Well, it actually puts two bytes at the end of each string (0d 0a) but this is the linebreak/return and I can live with that.

Thanks guys

cb


jondecker76(Posted 2003) [#5]
There is nothing wrong with writestring - it is supposed to put an integer before the string (4 bytes). This is so that when you use readstring, it knows how many bytes to grab from the file or stream.


Rottbott(Posted 2003) [#6]
I've wondered why it needs a whole 4 bytes for the length of the string. Why not 2? Who uses a string longer than 65535 characters!?

Or one byte for the number of bytes for the length, followed by the length. Then most short strings would fit in 2 bytes, but a huge string of 15 million characters could still fit into four bytes. That would be a lot more efficient.


Warren(Posted 2003) [#7]
I've wondered why it needs a whole 4 bytes for the length of the string. Why not 2? Who uses a string longer than 65535 characters!?

I think the answer is ... who cares? :)

Are you really storing enough strings in your file where this becomes a storage constraint for you?


FlameDuck(Posted 2003) [#8]
Who uses a string longer than 65535 characters!?
Well not the programmers of NotePad, that's for sure...

I'd reckon the Blitz IDE (for example) uses strings larger than 64K.


WolRon(Posted 2003) [#9]
You could use write string to load in an entire novel which may be larger than 64K (if I'm not mistaken, a string can include the linebreak, return characters so that your program could still print out the format correctly).

Or you could use it to load in some kind of specific data from a file that recorded a running string of values from something (like an internet connection?) that went on for hours.

SOMEBODY will find a use for it.


BTW, if you don't want the wasted space by the 2 or 4 characters then use Write/ReadByte instead. Of course, you would have to know ahead of time how long the string is.


Rottbott(Posted 2003) [#10]
I know, I can write my own string write/read functions.

And I know that 2 bytes isn't the end of the world.

But why didn't anyone deciding the format of a string stored in this way (as opposed to null-terminated) do it with a byte for the number of bytes for the length? In 90% of cases it would save 2 bytes per string. It's not much, but it's still a fairly obvious way to do it, and if it saves space, why not?


skidracer(Posted 2003) [#11]
Blitz strings are allowed to contain nulls Chr$(0) so null termination is out, and HTML pages are a good example of string handling where you could easily break a 64K limit.


Rottbott(Posted 2003) [#12]
I know.

But doing it with a byte to store the length of the length, means you can have any strings up to 4,294,967,295 characters long (as you can now), yet MOST strings will only take up 2 bytes for the "header" rather than always using 4 as they do now.

Seems a more logical way to do it to me...


jondecker76(Posted 2003) [#13]
i like the way it works now. It is a matter of pure convenience. You can store a string - whether its 2 characters or 10,000 - and not have to worry about how to retrieve it. I'm using it extensively in a project of mine where strings can easily go beyond 10-20,000. Instead of having to write my own parsing routine to find a null terminator character, I just use one function. Sure has made my project a lot easier to deal with.

Also, its great for saving\loading type collections


Rottbott(Posted 2003) [#14]
It's simple enough:

Function RottbottReadString$(File)
  LenLen = ReadByte(File)
  For i = 1 To LenLen : SLen$ = SLen$ + Chr$(ReadByte(File)) : Next
  For i = 0 To LenLen - 1
    Len = Len Or (Asc(Mid$(SLen$, i + 1, 1)) Shl i * 8)
  Next
  For i = 1 To Len
    In$ = In$ + Chr$(ReadByte(File))
  Next
End Function


You could use a bank with ReadBytes() to make it quite a lot more efficient. The equivalent WriteString would be even simpler.


MSW(Posted 2003) [#15]

I know, I can write my own string write/read functions.

And I know that 2 bytes isn't the end of the world.

But why didn't anyone deciding the format of a string stored in this way (as opposed to null-terminated) do it with a byte for the number of bytes for the length? In 90% of cases it would save 2 bytes per string. It's not much, but it's still a fairly obvious way to do it, and if it saves space, why not?



Decideing the string format?

Um...there really is no set in stone string format, the CPU can't tell the difference between a 4 byte number stored in one of it's registers and a string of 4 characters without some extra overhead provided by the program provided opcodes to present the string onscreen through a font or "print text" routine.

As such it is up to the developers of the compiler to implament string formats...and Mark has chosen to go with a simple but very flexable structure...internaly within Blitz it makes a lot of sense to use a 32-bit integer to store the length of the associated string...Remember that a string can be passed to various functions (actually a memory "pointer" is PUSHed onto the stack, to be POPed once the function code executes)...and if you had several versions of a string (a "short" string with a 1 byte header..."medium" string with 2 byte header, etc) then this will cause all sorts of problems as during the execution of your application you may pass the same function routine a "short" string most of the time, but every so often a "medium" string is passed because you are 1 character over the 256 character "short" string limit...it is then up to the code in the function to decifer just what sort of string it has been passed...by keeping it simple Blitz can use the exact same core opcodes for strings everywhere within the application...

You can write your own string file saveing functions...as you have direct knowledge of what strings you will need, how long they are, etc...Mark didn't know what you had in mind when developing Blitz so he kept things much simpler to accomidate as many programing situations as possable...

I use strings for just about everything, from storeing text as strings are designed to do...to useing them to store graphics, linked lists with parent/child relationships, radix sorting routines, just about everything where I need to have axcess to individual bytes (chars)...strange but true.


Rottbott(Posted 2003) [#16]
Well once the string was read from the file/stream it would be stored however Blitz stores them at the moment, so it wouldn't make processing any more complex. I'm not talking about internal operations on strings, *just* the read/write functions. There's no point saving a even a few hundred k of RAM these days, but that much could make a difference for sending over an internet connection.

Obviously I can easily write my own read/write functions to do it this way, I'm just wondering why the built in ones don't, it just seems more sensible. I assume it's to save a CPU cycle or two, but that seems kinda pointless. CPU cycles aren't the limiting factor for very many things, especially sending data over a network.


MSW(Posted 2003) [#17]
It doesn't because it would still need some other flag system to indicate if the string it is reading from a file is of the 1, 2, 3, or 4 bytes "header types" (and as a byte is the smallest single construct that can be written/read from a file...this seperate flag system would use it's own byte even if only a couple of bits within it are actualy used...)...and useing a special character to indicate the end of a string is kinda wasteful in that Blitz needs to allocate system memory to store such a string, which it will have to allocate a certan amount...read the string, and if the string is larger then this amount (Blitz hasn't reached that special character) it would have to re-allocate more memory to continue reading in the string...which can be a perfomance issue, but the real problem comes from that special character in that it completely limits what you can do with strings (imagine sending out a packet over the internet with the recieveing computer getting the special character before it gets the rest of the string...

Remember that there are lots of seldom used characters in a string...what you can do is assign some of these a "flag" like status...then the user may type in some text to send online to another player...in your code you can then stake that string...add the character for one of these special "flags" to it and then add the BIN() values of the variables associated with that flag all into one string...and send it out over the net...the recieveing computer then can "decipher" the string finding that it contains some text to display onscreen and useing the sepecial character "flags" to do other things like move the specific character onscreen, etc...it's going to send out such special flags anyway for each seperate bit of data sent over the net, might as well take advantage of strings to save on packet size (hint: you can use one special character flag to indicate that the next several bytes are part of a block of data...instead of sending the X,Y,Z values of the location of a player as seperate variable values...set up a construct like Packet$ = Char$(special flag) + BIN(playerX) + BIN(PlayerY) + BIN(playerZ)....this will result in a string 17 bytes long (4 byte string "header" + 1 byte special character flag + 4 bytes each for player X,Y, and Z)...sending it as per variable value includes special blitz "flags" so it can decipher what the variable is supposet to be (float, integer, string) so sending a packet containing just playerX, playerY and playerZ may use 24 bytes as each would possably be seperated by a 4 byte flag used to indicate both type of variable and where in the packet it exist)...remember Mark doesn't know exactly what you will be sending across the net and storeing in a file...only you do...


_Skully(Posted 2003) [#18]
why would you write numerical values as strings.. why not use:

writebyte, writefloat, writeshort etc?

there is also write/readbytes which can write to a stream to/from a bank as well...

Skully


Rottbott(Posted 2003) [#19]
It doesn't because it would still need some other flag system to indicate if the string it is reading from a file is of the 1, 2, 3, or 4 bytes "header types" (and as a byte is the smallest single construct that can be written/read from a file...this seperate flag system would use it's own byte even if only a couple of bits within it are actualy used...)...and useing a special character to indicate the end of a string is kinda wasteful in that Blitz needs to allocate system memory to store such a string, which it will have to allocate a certan amount...read the string, and if the string is larger then this amount (Blitz hasn't reached that special character) it would have to re-allocate more memory to continue reading in the string...which can be a perfomance issue, but the real problem comes from that special character in that it completely limits what you can do with strings (imagine sending out a packet over the internet with the recieveing computer getting the special character before it gets the rest of the string...


Firstly, I've never mentioned wanting to use null-terminated strings. They aren't an issue at all. I merely mentioned in passing that null-terminated was the other way of storing strings, and NOT the one that I was interested in.

Nor do I have a problem sending my data over the network, in fact I use BlitzPlay which takes care of all that for me, even as far as compressing floats and ints into strings... so I don't have to worry about the exact format of messages.

My simple question is, why isn't my suggested string format used by Blitz (or other languages) for reading/writing to streams and files? There seems no logical reason why not, and in 99.9% of cases it is more efficient by 2 bytes per string.


skidracer(Posted 2003) [#20]
A method that polls a stream for arrival of a complete string with the ReadAvail (to avoid blocking) would be even more complicated than your example code above.

Currently the presence of 4 bytes available on the incoming stream guarantees you can read the string length without blocking.


Rottbott(Posted 2003) [#21]
A method that polls a stream for arrival of a complete string with the ReadAvail (to avoid blocking) would be even more complicated than your example code above.

Currently the presence of 4 bytes available on the incoming stream guarantees you can read the string length without blocking.

That's true. You'd have to wait for one byte available, read it, and then decide if you have to wait any longer for the next 1-4 bytes. It's an extra step from what you have to do now, but not a very CPU intensive one. Not too hard to code though if you know the string format?