string comparison

BlitzMax Forums/BlitzMax Programming/string comparison

slenkar(Posted 2010) [#1]
does blitzmax check the length of the 2 strings before comparing them?

(if they are different lengths they can be counted as not matching)


Gabriel(Posted 2010) [#2]
I've had a dig through the source code, and no, it doesn't appear to do that check before comparing them.

If you want to check, look for the keyword "bbStringCompare" in C:\Program Files\BlitzMax\mod\brl.mod\blitz.mod\blitz_string.c

So not unless it does it in BlitzMax code somewhere, but there doesn't appear to be any additional BlitzMax code for string equality.


slenkar(Posted 2010) [#3]
ok thanks


Czar Flavius(Posted 2010) [#4]
Is that an optimization that could be made?


Gabriel(Posted 2010) [#5]
If your game/app compares tens of thousands of strings every frame, and this is causing your game/app to perform badly, I would assume so.


Czar Flavius(Posted 2010) [#6]
How is the length calculated? If it works like TList.Count(), it's probably best not to bother.


plash(Posted 2010) [#7]
How is the length calculated? If it works like TList.Count(), it's probably best not to bother.
A BlitzMax string is not a linked list. It knows its own size, as well. You could easily write your own compare-string function in C for length-vs-length optimization.
Currently bbStringCompare in brl.blitz does not do this. I assume that is because the Compare method is supposed to return whether a is greater than, equal to, or less than b, even though bbStringCompare doesn't follow that rule (it only returns equality).

Last edited 2010


Jesse(Posted 2010) [#8]
what's wrong with simple size comparison:
if str1.length < str2.length

I am sure this is as fast as its going to get.


Tommo(Posted 2010) [#9]

even though bbStringCompare doesn't follow that rule (it only returns equality)


No, it does return character order. That's why we can easily sort strings within array or TList.

strict

local array$[] = ["ba", "cc", "bb", "a"]

k.Sort()

for local n$ = eachin array
	print n
next



Jesse(Posted 2010) [#10]
this is what blitz string does for the comparison:
it does a for next loop using the shortest length string in which it compares every character until it finds the first unequal character then return the none 0 difference of the two characters
if in the whole for next loop it doesn't find a character that is difference then it returns the difference in the length of the two. -n, 0 or +n.


plash(Posted 2010) [#11]
No, it does return character order. That's why we can easily sort strings within array or TList
Huh. I was always getting true or false when I tested string comparison. The fact that it does obey the Compare method rules, though, is proof that bbStringCompare can't be optimized in such a way (I'd make your own comparison function in C).


TomToad(Posted 2010) [#12]
It returns True/False when using '=', it returns -1,0,1 when using Compare()
SuperStrict

Local String1:String = "abcde"
Local String2:String = "bcdef"
Local String3:String = "abcde"

Print "String1 = String2 "+(String1 = String2)
Print "String2 = String1 "+(String2 = String1)
Print "String1 = String3 "+(String1 = String3)
Print "~nString1.Compare(String2) "+(String1.Compare(String2))
Print "String2.Compare(String1) "+(String2.Compare(String1))
Print "String1.Compare(String3) "+(String1.Compare(String3))



Tommo(Posted 2010) [#13]

It returns True/False when using '=', it returns -1,0,1 when using Compare()


It always calls bbStringCompare, you can find that in generated asm code.


ziggy(Posted 2010) [#14]
Faster compare:
SuperStrict
Const Iterations:Long = 10000000
Local Variable1:String = "Hello world"
Local Variable2:String = "this is another string"
Local comparison:Long
Local m:Int = MilliSecs()
comparison = 0
For Local i:Int = 0 To Iterations
	If (Variable1 = Variable2) Then
		comparison:+1
	End If
Next
Print "Regular compare:" + (MilliSecs() - m)
comparison = 0
m = MilliSecs()
For Local i:Int = 0 To Iterations
	If (FastStringComp(Variable1, Variable2)) Then
		comparison:+1
	End If
Next
Print "Optimized compare:" + (MilliSecs() - m)

Function FastStringComp:Int(str1:String, str2:String)
    If str1.length <> str2.length Return False Else Return str1 = str2
End Function

speed benefit only in release mode. In debug mode it is slower. If you're going to compare strings of a given length, it's a bit slower than regular cmoparer (but only a very little bit), and for common string comparison wich you don't know the length of them, it should be a bit a lot faster. (almost twice faster in the small sample)

Last edited 2010

Last edited 2010


Jesse(Posted 2010) [#15]
here the Blitz_string.c function:
int bbStringCompare( BBString *x,BBString *y ){
	int k,n,sz;
	sz=x->length<y->length ? x->length : y->length;
	for( k=0;k<sz;++k ) if( n=x->buf[k]-y->buf[k] ) return n;
	return x->length-y->length;
}

apparently it does a final sgn function call.


Dabhand(Posted 2010) [#16]
Local strOne:String = "Hello"
Local strTwo:String = "Hellos"

Print Compare(strOne,strTwo)

Function Compare:int(x:String , y:String)
	If Object(x) = Object(y) Then Return true
	Return false
End Function 


Any good?

Dabz


Gabriel(Posted 2010) [#17]
Local strOne:String = "Hello"
Local strTwo:String = "Hellos"
strTwo = strTwo[..5]

Print Compare(strOne,strTwo)
Print strOne=strTwo

Function Compare:Int(x:String , y:String)
    If Object(x) = Object(y) Then Return True
    Return False
End Function 

I don't know how object comparison works because I haven't looked it up, but it doesn't behave the same way as string comparison.


Dabhand(Posted 2010) [#18]
Mmmm, just thought it was worth a punt! :)

Seems to graft until you slice it, wonder why!?!

Dabz


Gabriel(Posted 2010) [#19]
As you say, it seems to work fine until you slice it. You can reassign it multiple times, give it longer or shorter strings and then reassign the matching value and it still works fine. As soon as you slice it, it gets lost.


Czar Flavius(Posted 2010) [#20]
What does this mean? A bug?


Dabhand(Posted 2010) [#21]

What does this mean? A bug?



Not sure, I tend not to dig that deep when it comes to BlitzMax, I used that once years ago in another language, cannot recall the reason now, but as soon as I saw the string comparison thing, that just sprung to mind.

Would be nice to know why the slice buggers it if its not a bug, just for future reference!

Dabz


Gabriel(Posted 2010) [#22]
I haven't a clue. I assumed that strings are either immutable or they're not. Since they are in BlitzMax, I believed that it would either work or not work. What slicing does that other attempts to modify a string don't do, I really couldn't say.

BTW, has anyone actually tried my "breaking" test up there besides me? I haven't updated BlitzMax in ages, so it could very easily work fine on an updated version for all I know.


Dabhand(Posted 2010) [#23]
If you mean post #17, then yeah, it borks here, Mac Max 1.41!!!

Dabz


Tommo(Posted 2010) [#24]
"Object = Object" only compares their memory addresses.


Gabriel(Posted 2010) [#25]
If you mean post #17, then yeah, it borks here, Mac Max 1.41!!!

That's the one. Well if it's that way in 1.41 as well, then I really don't know what slicing does different.

"Object = Object" only compares their memory addresses.

No, it doesn't. If it did, this wouldn't work.

Local strOne:String = "Hello"
Local strTwo:String = "Hello"

Print Compare(strOne,strTwo)

Function Compare:Int(x:String , y:String)
    If Object(x) = Object(y) Then Return True
    Return False
End Function 



ziggy(Posted 2010) [#26]
No, it doesn't. If it did, this wouldn't work.

Yes it does, but it is using the same mem allocation for equal literals (creating a single mem addres for them in compile time).
This shows it:
Strict
Local strOne:String = "Hello"
Local strTwo:String = "Hellos"
strTwo = Left(strTwo, 5)
Local StrTree:String = "Hello"
Print Compare(strOne, strTwo)
Print Compare(strOne, strTree)

Function Compare:Int(x:String , y:String)
    If Object(x) = Object(y) Then Return True
    Return False
End Function 



Dabhand(Posted 2010) [#27]
Ah, I see... Well, that explains it then!

:)

Dabz


Gabriel(Posted 2010) [#28]
To be honest, that was my first thought when I saw the results, but I dismissed it as being too ludicrous to be true. I'm still not sure I've changed my mind. I suppose, since strings are immutable, there's no real harm in it, but it still feels wrong.


ziggy(Posted 2010) [#29]
To be honest, that was my first thought when I saw the results, but I dismissed it as being too ludicrous to be true. I'm still not sure I've changed my mind. I suppose, since strings are immutable, there's no real harm in it, but it still feels wrong.

This is very common in compilers. When you're writing the intermediate representation of the syntax tree, you add references to the literal resources (strings). While the AST is being built, it's common to add a literal reference ONLY if it is new, so generated executable is smaller and has a better handling of redundant resources. But as none of us has the bcc source code, this can be different.
Anyway, taking a look to the generated assembler seems to confirm both strings are sharing the same literal representation.