Not the speed increase I'd hoped for....

BlitzMax Forums/BlitzMax Programming/Not the speed increase I'd hoped for....

Who was John Galt?(Posted 2004) [#1]
I've recoded some old Blitz plus stuff into Bmax. Didn't use bbconv, just did it by hand. My program's a rigid body engine, but it runs a lot slower under Bmax! I ahven't tried to do anything clever with the re-write so there's little difference in the code. Narrowed time difference down to one main routine....

Function circle_constraints()
	'Attempts To move all circles out of collision
	Local c:circle 	' circle
	Local cc:circle	' circle tested For col with	
	
	c=c_list._head
	While c
		cc=c_list._head
		While cc
			If (c<>cc)
				Local coldist#=c.rad+cc.rad
				Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0)
				If d<coldist
					Local mx#=(coldist-d)*(c.x-cc.x)/d
					Local my#=(coldist-d)*(c.y-cc.y)/d
					
					c.x=c.x+mx/2.0
					c.y=c.y+my/2.0
					cc.x=cc.x-mx/2.0
					cc.y=cc.y-my/2.0
					
				EndIf 
			EndIf
			cc=cc._next
		Wend
		c=c._next
	Wend
	Return
End Function

18 ms for 40 circles in bmx

Function circle_constraints()
	;Attempts to move all circles out of collision
	Local c.circle 	; circle
	Local cc.circle	; circle tested for col with	
	
	For c=Each circle
		For cc=Each circle
			If (c<>cc)
				Local coldist#=c\rad+cc\rad
				Local d#=Sqr((cc\x-c\x)^2+(cc\y-c\y)^2)
				If d<coldist
					Local mx#=(coldist-d)*(c\x-cc\x)/d
					Local my#=(coldist-d)*(c\y-cc\y)/d
					
					c\x=c\x+mx/2.
					c\y=c\y+my/2.
					cc\x=cc\x-mx/2.
					cc\y=cc\y-my/2.
					
				EndIf 
			EndIf
		Next
	Next
	Return
End Function



2ms for 40 circles in B+. Any help/suggestions appreciated.


Michael Reitzenstein(Posted 2004) [#2]
Sqr is a double precision function now, so in your first loop, the floats are being cast to double, the sqrt is found, and this is then cast back to float. (I believe) you could speed it up considerably by changing these:

Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0)
				If d<coldist


To this:

Local xdist# = ( cc.x - c.x )
Local ydist# = ( cc.y - c.y )
Local d# = ( xdist * xdist ) + ( ydist * ydist )
If d < ( coldist * coldist )



Who was John Galt?(Posted 2004) [#3]
Thanks Michael -

I never said it was optimised yet - it's only the speed difference for identical code that concerns me. I'll sort out the issue with the sqr function (well spotted!!) and see if it makes the difference.


BlitzSupport(Posted 2004) [#4]
Make sure you're using FlushMem too!


Michael Reitzenstein(Posted 2004) [#5]
Well I only posted it because it meant you weren't calling the BlitzMax Sqr function. If BlitzMax had a float Sqrt, the BlitzMax code would execute significantly faster than the Blitz3D code, because the compiler generates much more sensible assembler.

I think you'll find that with the optimisation I posted in place implemented in both versions, the BlitzMax version will totally thrash the BlitzPlus one. This raises an interesting issue though - Max needs double math functions, but they probably shouldn't be the default ones since they're not nearly as fast.


Robert(Posted 2004) [#6]
Agreed - there are relatively very few occassions where double accuracy is absolutely needed.


teamonkey(Posted 2004) [#7]
What happens if you use doubles throughout? I think the casting is a pretty large bottleneck, but every CPU on the market has support for fast double-precision math in some form or other (whether Max has the code to support it, I don't know).


marksibly(Posted 2004) [#8]
Hi,

This is odd...

There is no casting required when a double is returned from a function as it's returned in an FP register anyway. The overhead of passing a double to a function is having to 'push' 8 bytes instead of 4.

In other words, I don't think its a float/double thing but some other weird issue. The '^2.0' is likely to be slow - perhaps I've optimized this in BP to a multiply but not in Max...

And both are of course running with debug disabled?


Dreamora(Posted 2004) [#9]
^2 was already a speedprob in Blitz3D where powers only were usefull if 3 or 4 upwards ... for ^2 you better wrote it out.

Think the problem is the double sqrt needs *2^32bit times longer to calculate than a float sqrt ...

perhaps there is a float sqrt function? ( will check that and edit this posting here )


marksibly(Posted 2004) [#10]
Ok, I've found it: the ^ operator is absolutely killing performance!

Replacing...

Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0)

...with...

Local dx#=cc.x-c.x
Local dy#=cc.y-c.y
Local d#=Sqr( dx*dx+dy*dy )

...takes execution time from 1860 down to 7!

Further good news: the BP original takes 138, and 'tuned' version 46, so Max *is* faster than BP.

Will definitely be looking at this, but in the meantime avoid ^!

Here's the mockup program I did - does this look a bit like how it might be used in the 'real world'?

Type circle

	Field _next:circle
	Field x#,y#,rad#
 
End Type

Type clist

	Field _head:circle

End type

Function clist_add( cl:clist,c:circle )

	c._next=cl._head
	cl._head=c

End Function

Global c_list:clist=New clist

Function circle_constraints()
'Attempts To move all circles out of collision
	Local c:circle ' circle
	Local cc:circle' circle tested For col with

	c=c_list._head
	While c
		cc=c_list._head
		While cc
			If (c<>cc)
				Local coldist#=c.rad+cc.rad
				Local dx#=cc.x-c.x
				Local dy#=cc.y-c.y
				Local d#=Sqr(dx*dx+dy*dy)
'				Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0)
				If d<coldist
					Local mx#=(coldist-d)*(c.x-cc.x)/d
					Local my#=(coldist-d)*(c.y-cc.y)/d
					c.x=c.x+mx/2.0
					c.y=c.y+my/2.0
					cc.x=cc.x-mx/2.0
					cc.y=cc.y-my/2.0
				EndIf 
			EndIf
			cc=cc._next
		Wend
		c=c._next
	Wend
	Return
End Function

For Local i=1 To 500
	Local c:circle=New circle
	c.x=rnd(640)
	c.y=rnd(480)
	c.rad=rnd(160)
	clist_add c_list,c
Next

t=MilliSecs()
circle_constraints
t=MilliSecs()-t

Print t



Who was John Galt?(Posted 2004) [#11]
Sweet as a nut - and yes that's pretty much how I'm using it. Thanks to everyone for the input. Nothing like a 'Max is slower than B+' post to get Mark's attention ; )

Merry xmas all.


Floyd(Posted 2004) [#12]
Here's another oddity.

The ^2.0 test code is extremely slow. You can speed it up a little by turning Debug ON!


Who was John Galt?(Posted 2004) [#13]
Woah, that one needs investigating, Floyd


Tom(Posted 2004) [#14]
Kanati said the debug version of my shader test was faster than release:

I find it very odd that I get 1580fps average with a debug build. And 1200-1300 variable average on a non-debug build.

http://www.blitzbasic.com/Community/posts.php?topic=41957

Tom


Kanati(Posted 2004) [#15]
Yeah, I didn't actually put it in the bugs forum because I didn't know if it was just me or not... but it looks like I might not be the only one.