something strange about the Lower() function

Archives Forums/Blitz3D Bug Reports/something strange about the Lower() function

OJay(Posted 2009) [#1]
just had a case, where i needed to compare a lot of strings, so i guessed a short benchmark would be a good idea, to get a feeling of how efficiently blitz will handle this comparisons, and were surprised what Lower() does to strings...

looking at the following code, you will notice that there are 2 long completely identical strings, which get compared 3million times. so this is of course the worst case of string comparison: the code has to compare every single character to evaluate the equality (if you change the first character of one string it gets a lot faster).

the code takes ~830ms in debug mode, and ~600ms in release mode on my machine. thats just fine, will be fast enough for my needs.

string1$ = "welkfherakfjhewfihwaiöfhweöifhwiöoefhwelkfherakfjhewfihwaiöfhweöifhwiöoefhwelkfherakfjhewfihwaiöfhweöifhwiöoefh"
string2$ = "welkfherakfjhewfihwaiöfhweöifhwiöoefhwelkfherakfjhewfihwaiöfhweöifhwiöoefhwelkfherakfjhewfihwaiöfhweöifhwiöoefh"

;string1 = Lower(string1)

t1 = MilliSecs()
For i = 0 To 3000000

	If string1 = string2
	EndIf

Next
Print MilliSecs()-t1
WaitKey


now uncomment the line with the Lower() function. you might expect, nothing will change in execution speed, since string1$ already is all lowercase, just as string2$.
but now the runtime increases to ~1500ms in debug and ~1200ms in release mode! thats a 200% increase! what the....?

Lower()'ing string2$ too makes it even worse (2100ms vs 1800ms)...hmmm...

so, whats up with Lower(), that would explain this? am i missing something here?

(tested with v1.103 only)


Beaker(Posted 2009) [#2]
I don't really see why this surprises you. Lower() still has to process every single character of the string, because it can't know that the string is already lowercase.

If and Lower() should take approximately the same amount of time to process.


OJay(Posted 2009) [#3]
huh? the Lower() function is outside the benchmark loop, so it shouldnt have any effect to the measurement...


Beaker(Posted 2009) [#4]
Oops. My mistake. In my defence, I was half asleep when I posted. :)

You are right tho, it is odd.


PowerPC603(Posted 2009) [#5]
Graphics 1280, 600, 0, 2

string1$ = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
string2$ = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"

Print Chr$(34) + string1$ + Chr$(34)
Print Chr$(34) + string2$ + Chr$(34)
Print "Comparing without lowering the strings"
t1 = MilliSecs()
For i = 0 To 3000000

	If string1$ = string2$
	EndIf

Next
Print MilliSecs()-t1
Print ""
Print ""
Print ""



string1$ = Lower$(string1$)
string2$ = Lower$(string2$)

Print Chr$(34) + string1$ + Chr$(34)
Print Chr$(34) + string2$ + Chr$(34)
Print "Comparing with lowering both strings"

t1 = MilliSecs()
For i = 0 To 3000000

	If string1$ = string2$
	EndIf

Next
Print MilliSecs()-t1
WaitKey


I've expanded your test-program a little.
It doesn't seem te be related to the weird chars in the string, normal ones have the same problem.
Tested with v1.98, this gave me 1124ms without lowering the strings and 2539ms with lowering both strings.

This is a really weird bug.
I've printed the strings to the screen just to make sure Lower() didn't add any characters.


_PJ_(Posted 2009) [#6]
I tried making the vars global but the times are still inconsistent and around the same ballpark.

Then I thought I'd try something else, just in case for some reason, because the vars DID change, maybe there was something in that and teried with adding two new vars to represent the Lower()'ed versions. Again, still the same kinda difference :S

Another thought was to put the Lower in tright at the start, of course, it meant the comparison betrween both runs was useless from a practical standpoint, but the results were very interesting.

Graphics 1280, 600, 0, 2

Global string1$="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
Global string2$ = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"

Print Chr$(34) + string1$ + Chr$(34)
Print Chr$(34) + string2$ + Chr$(34)
Print "Comparing without lowering the strings"
t1 = MilliSecs()
For i = 0 To 3000000

	If string1$ = string2$
	EndIf

Next
Print MilliSecs()-t1
Print ""
Print ""
Print ""
Global l1$=Lower$(string1$)
Global l2$=Lower$(string2$ )

Print Chr$(34) +l1$ + Chr$(34)
Print Chr$(34) +l2$ + Chr$(34)
Print "Comparing with lowering both strings"

t1 = MilliSecs()
For i = 0 To 3000000

	If l1$ =l2$
	EndIf

Next
Print MilliSecs()-t1
WaitKey



The difference is STILL THERE!

For my next little test, I changed ther initial declarations from the sample above to Constant:

Const string1$="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
Const string2$ = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"

As would be expected, the first check was super-fast (actually about 3ms!!!)
but the second was STILL around 2000

Finally, I just had to see if Upper$() had a similar effect and it seems so, though whilst Lower$() seemed to give a difference of around n*3 Upper$() was more around n*2.5 (marginally quicker by about 200ms on my comp.)


_PJ_(Posted 2011) [#7]
Something even weirder happens if you use a separate function to wrap the lower command, making a completly new variable, the time taken is now greatly REDUCED!!!

Graphics 1280, 600, 0, 2

Function PerformLower$(AString$)
	bString$=Lower$(AString$)
	Return bString$
End Function


Global string1$="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
Global string2$="abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"

Print Chr$(34) + string1$ + Chr$(34)
Print Chr$(34) + string2$ + Chr$(34)
Print "Comparing without lowering the strings"
t1 = MilliSecs()
For i = 0 To 3000000

	If string1$ = string2$
	EndIf

Next
Print MilliSecs()-t1

Global s1$=PerformLower$(string1$)
Global s2$=PerformLower$(string2$)

Print Chr$(34)+s1$+Chr$(34)
Print Chr$(34)+s2$+Chr$(34)
Print "Comparing with lowering both strings"

t1 = MilliSecs()
For i = 0 To 3000000

	If l1$=l2$
	EndIf

Next
Print MilliSecs()-t1
WaitKey


Last edited 2011


Yasha(Posted 2011) [#8]
I'm'a go out on a limb and hazard that the Const speedup is caused because the compile-time optimiser can see the values are the same, and optimise the check out entirely the way it does with most other constant operations, which is incredibly fast at runtime because no check needs to be done.

Once a string has gone through Lower() though, because it's a value returned from a function, the compiler doesn't know it can be optimised out and they have to be compared at runtime (which is slower).

When the same string is stored in two vars without running through Lower(), it's possible the same object is used and a fast comparison by pointer can be used instead of comparison by value (no idea if it does this). A value returned from Lower() will probably be a new object though and always fail a pointer-comparison, forcing a full value test.

As for your most recent discovery... I got nothing.

Last edited 2011


_PJ_(Posted 2011) [#9]
Yeah, the Const being quicker aby itself, was pretty much expected. It shouldn't have any effect on the second comparison, though, again, as expected.

When the same string is stored in two vars without running through Lower(), it's possible the same object is used and a fast comparison by pointer can be used instead of comparison by value (no idea if it does this). A value returned from Lower() will probably be a new object though and always fail a pointer-comparison, forcing a full value test

I think, if I understand this right, that I was thinking something along those lines, which was why I went for the route of another function, so when passed through the Lower function, it's not the same variable at all, meaning only the newly passed lowercase result is used in the comparison.

However, that ought to indicate the first and second comparisons then take the same time... (even if the original test should also take the same time really ;) ) Instead, my computer makes it out to be twice as fast!

I wonder if there's any benefit in making an extra function call instead of just "Lower()" or "Upper()" generally?
But even so, it's still very very strange!


Subirenihil(Posted 2011) [#10]
Interesting...

If I change the
l1$=Lower$(string1$)
to
l1$=Left$(Lower$(string1$)+"-",Len(string1$))
then I get the same speed that I get from passing it to a function or from the original strings.