@BRL - request: String indexing boundary checks

BlitzMax Forums/BlitzMax Programming/@BRL - request: String indexing boundary checks

Fabian.(Posted 2006) [#1]
Hi,

try this code:
Strict
Framework brl.blitz

Local Text$ = "Text"
WriteStdout Text [ -5 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -4 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -3 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -2 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -1 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 0 ] + "~n" 'Ok, returns: 84  -> T
WriteStdout Text [ 1 ] + "~n" 'Ok, returns: 101 -> e
WriteStdout Text [ 2 ] + "~n" 'Ok, returns: 120 -> x
WriteStdout Text [ 3 ] + "~n" 'Ok, returns: 116 -> t
WriteStdout Text [ 4 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 5 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 6 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 7 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 8 ] + "~n" 'Danger of undefined results!
Currently it seems to be possible in BlitzMax to access any memory location just by indexing a string. There's no boundary check (like with arrays). I don't like this, since this could be the source of some wired and hardly findable bugs. It is just a lack of security when programming. Could you please add a some code which checks whether the index is smaller than zero or greather or equal to the string size, and then throws an "Index out of bounds!" exception?


Dreamora(Posted 2006) [#2]
Wouldn't it be easier just using the functionality that is officially in for :String instead of using hack approaches like [x] on them?
I'm aware that they work, but its up to you to take care that you are not doing some crap actually.

Just use index 0 .. string.length-1 and you won't have problems. (ie write yourself set - get functions that throw errors if you try to do unallowed access)

Simple reason is, that actually index -4 and -8 have their own needs on objects (what you see above is no string specific behavior) and removing that until the stuff behind it is actually exposed to the users would be a very bad idea, at least out of my sight.


grable(Posted 2006) [#3]
I would think this is a bug.

Strings should atleast have boundry checks in DEBUG mode, as with Arrays.


Dreamora(Posted 2006) [#4]
Why?
String is no array object. The [x] approach only works because you jump around on the actual objects memory.
It is no string thing that you can use [x] to get the characters within the string. (beside that: using that way you break UTF)

If you need a character array, use one or create something similar to StringStreams in other language (you could use a TBank as well).
String is no character array as in C / C++


grable(Posted 2006) [#5]
A more robust/consistant language?

Ive been going over the asm output and all i can see is a call to _brl_blitz_ArrayBoundsError when indexing arrays in debug mode.

But i allso seem to remember mark saying something about String indexing being a big hack, so im not going to argue ;)

Maybe this is something to hope for in the eventual rewrite of the blitzmax compiler =) hehe


H&K(Posted 2006) [#6]
I agree with Dream, what Fabian is doing is in fact a "Hack", of which I am quite happy to use. However because it is a hack it has the advantage of non of the boundary Checks being made, and is therefore faster.

If you cannot keep track of the boundary yourslf use :string, if you can and you want the speed advantage of the system not keep track of boundary then use string/array


Fabian.(Posted 2006) [#7]
what you see above is no string specific behavior
??? What does this mean? Is this valid BlitzMax code?
Strict
Framework brl.blitz

Import brl.linkedlist

Local Text:TList = CreateList ( )
WriteStdout Text [ -5 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -4 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -3 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -2 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ -1 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 0 ] + "~n" 'Ok, returns: 84  -> T
WriteStdout Text [ 1 ] + "~n" 'Ok, returns: 101 -> e
WriteStdout Text [ 2 ] + "~n" 'Ok, returns: 120 -> x
WriteStdout Text [ 3 ] + "~n" 'Ok, returns: 116 -> t
WriteStdout Text [ 4 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 5 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 6 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 7 ] + "~n" 'Danger of undefined results!
WriteStdout Text [ 8 ] + "~n" 'Danger of undefined results!
You can only index arrays, strings and pointers (as far as I know).
actually index -4 and -8 have their own needs on objects and removing that until the stuff behind...
This almost sounds like I want to remove this memory? I just want that you aren't able to access the memory by indexing. The C code, which of course needs to access to the "class" and "refs" fields (index -4 and index -8 are not the actual names) of the object won't be effected by this.
String is no array object.
Yes, that's right, however, you'll see that strings and arrays have some things in common: both have a "changing object size" (that means that different instanced of the same class can have different sizes), both are hard coded in the bcc, both are sorted collections, both can be sliced, both can be indexed. It isn't that strange to expect that both have the same security checks.

However I'm currently thinking you muddle something up: Are you aware of the difference between
Local Text$ = "Text"
WriteStdout Text [ 3 ] + "~n"
and
Local Text$ = "Text"
WriteStdout ( ( Short Ptr Byte Ptr Object Text + 2 ) [ 3 ] + "~n" )
?
Although it is doing the same it is different: The first one directly indexes the string, so this is nearly the same like indexing an array. The second one does first convert the string object to a pointer, and then indexes the pointer. This of course should never be checked, since it is simply impossible to define the boundaries of a pointer. The second one could be called a "hack", because if you're doing this way you can get really strange results as soon as you just one time aren't 100% aware of what you're doing.
Wouldn't it be easier just using the functionality that is officially in for :String instead of using hack approaches like [x] on them?
Indexing an array is the same "hack" as indexing a string.
I would think this is a bug.
Actually I thought the same (so Dream's answer really surprised me); I just had a gut feeling saying me it would be better to post this as request...


Koriolis(Posted 2006) [#8]
I'm with Fabian, and laugh at the usage of the term "hack". No pun intended.
If that's a hack, and the current implementation may very well be one, then you should really be requesting for "dehacking" it. There is no reason why the [] operator when applied to strings couldn't perform bounds checking in debug mode. The compiler knows very well the type of the expression to which any operator is applied, so what's the problem really? It's applied to a pointer expression, OK let's just access the said memory area. It's applied to a string, OK let's perfoem some bounds checkings.


Dreamora(Posted 2006) [#9]
Erm the -4 and -8 access is not only important to the C code but to many BM programmers as well.

As mentioned above: if you want to have your boundary safe access use functions. Here is the code if you want it :)

function WriteByteToString(str:String, val:Byte, index:int)
   if index < 0 or index >= str.length Throw "Access out of boundary on string " + str
   str[index] = val
end function


function ReadByteFromString:Byte(str:String, index:int)
   if index < 0 or index >= str.length Throw "Access out of boundary on string " + str
   return str[index]
end function


I don't see a reason all should suffer with slower speeddue to stuff that is not officially in and that is only needed by programmers not taking care of what they actually do. (its not like string access is that high dynamic that you can not check for boundaries yourself, right)


grable(Posted 2006) [#10]
It wont be slower if you apply the same rules as for Arrays.

Bounds check in DEBUG mode only.


Koriolis(Posted 2006) [#11]
Dreamora, I think grable just reminded you a basic principle: actualy read the posts you're responding to :)


Fabian.(Posted 2006) [#12]
Erm the -4 and -8 access is not only important to the C code but to many BM programmers as well.
I'm sorry, but could you please post some code showing this usage? Maybe you're talking about this:
Strict
Framework brl.blitz

Global Obj:TType = New TType
Local Pointer:Byte Ptr

Pointer = Byte Ptr Obj - 4 'access the object's reference counter

Local IntPtr Ptr = Int Ptr Pointer
WriteStdout "Object has " + ( IntPtr [ 0 ] & $7FFFFFFF ) + " reference(s)~n"

Pointer = Byte Ptr Obj - 8 'access the object's class

IntPtr = Int Ptr Pointer
WriteStdout "The class struct of the object's class is located at memory address " + IntPtr [ 0 ] + "~n"

Type TType
EndType
However the boundaries I request won't effect that code at all. The code above is indexing pointers, but I just want to have boundaries when indexing strings (and also only in debug mode).

P.S.: The code you posted above is not valid BMX-code, since you're trying to write to strings, however, once created, a string cannot be changed any more.


marksibly(Posted 2006) [#13]
The string indexing boundary check thing is definitely a BUG!


Fabian.(Posted 2006) [#14]
Oh sorry, therefore I posted to the wrong forum :-)
But thanks a lot for declaring it's a bug!