Not the speed increase I'd hoped for....
BlitzMax Forums/BlitzMax Programming/Not the speed increase I'd hoped for....
| ||
I've recoded some old Blitz plus stuff into Bmax. Didn't use bbconv, just did it by hand. My program's a rigid body engine, but it runs a lot slower under Bmax! I ahven't tried to do anything clever with the re-write so there's little difference in the code. Narrowed time difference down to one main routine....Function circle_constraints() 'Attempts To move all circles out of collision Local c:circle ' circle Local cc:circle ' circle tested For col with c=c_list._head While c cc=c_list._head While cc If (c<>cc) Local coldist#=c.rad+cc.rad Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0) If d<coldist Local mx#=(coldist-d)*(c.x-cc.x)/d Local my#=(coldist-d)*(c.y-cc.y)/d c.x=c.x+mx/2.0 c.y=c.y+my/2.0 cc.x=cc.x-mx/2.0 cc.y=cc.y-my/2.0 EndIf EndIf cc=cc._next Wend c=c._next Wend Return End Function 18 ms for 40 circles in bmx Function circle_constraints() ;Attempts to move all circles out of collision Local c.circle ; circle Local cc.circle ; circle tested for col with For c=Each circle For cc=Each circle If (c<>cc) Local coldist#=c\rad+cc\rad Local d#=Sqr((cc\x-c\x)^2+(cc\y-c\y)^2) If d<coldist Local mx#=(coldist-d)*(c\x-cc\x)/d Local my#=(coldist-d)*(c\y-cc\y)/d c\x=c\x+mx/2. c\y=c\y+my/2. cc\x=cc\x-mx/2. cc\y=cc\y-my/2. EndIf EndIf Next Next Return End Function 2ms for 40 circles in B+. Any help/suggestions appreciated. |
| ||
Sqr is a double precision function now, so in your first loop, the floats are being cast to double, the sqrt is found, and this is then cast back to float. (I believe) you could speed it up considerably by changing these:Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0) If d<coldist To this: Local xdist# = ( cc.x - c.x ) Local ydist# = ( cc.y - c.y ) Local d# = ( xdist * xdist ) + ( ydist * ydist ) If d < ( coldist * coldist ) |
| ||
Thanks Michael - I never said it was optimised yet - it's only the speed difference for identical code that concerns me. I'll sort out the issue with the sqr function (well spotted!!) and see if it makes the difference. |
| ||
Make sure you're using FlushMem too! |
| ||
Well I only posted it because it meant you weren't calling the BlitzMax Sqr function. If BlitzMax had a float Sqrt, the BlitzMax code would execute significantly faster than the Blitz3D code, because the compiler generates much more sensible assembler. I think you'll find that with the optimisation I posted in place implemented in both versions, the BlitzMax version will totally thrash the BlitzPlus one. This raises an interesting issue though - Max needs double math functions, but they probably shouldn't be the default ones since they're not nearly as fast. |
| ||
Agreed - there are relatively very few occassions where double accuracy is absolutely needed. |
| ||
What happens if you use doubles throughout? I think the casting is a pretty large bottleneck, but every CPU on the market has support for fast double-precision math in some form or other (whether Max has the code to support it, I don't know). |
| ||
Hi, This is odd... There is no casting required when a double is returned from a function as it's returned in an FP register anyway. The overhead of passing a double to a function is having to 'push' 8 bytes instead of 4. In other words, I don't think its a float/double thing but some other weird issue. The '^2.0' is likely to be slow - perhaps I've optimized this in BP to a multiply but not in Max... And both are of course running with debug disabled? |
| ||
^2 was already a speedprob in Blitz3D where powers only were usefull if 3 or 4 upwards ... for ^2 you better wrote it out. Think the problem is the double sqrt needs *2^32bit times longer to calculate than a float sqrt ... perhaps there is a float sqrt function? ( will check that and edit this posting here ) |
| ||
Ok, I've found it: the ^ operator is absolutely killing performance! Replacing... Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0) ...with... Local dx#=cc.x-c.x Local dy#=cc.y-c.y Local d#=Sqr( dx*dx+dy*dy ) ...takes execution time from 1860 down to 7! Further good news: the BP original takes 138, and 'tuned' version 46, so Max *is* faster than BP. Will definitely be looking at this, but in the meantime avoid ^! Here's the mockup program I did - does this look a bit like how it might be used in the 'real world'? Type circle Field _next:circle Field x#,y#,rad# End Type Type clist Field _head:circle End type Function clist_add( cl:clist,c:circle ) c._next=cl._head cl._head=c End Function Global c_list:clist=New clist Function circle_constraints() 'Attempts To move all circles out of collision Local c:circle ' circle Local cc:circle' circle tested For col with c=c_list._head While c cc=c_list._head While cc If (c<>cc) Local coldist#=c.rad+cc.rad Local dx#=cc.x-c.x Local dy#=cc.y-c.y Local d#=Sqr(dx*dx+dy*dy) ' Local d#=Sqr((cc.x-c.x)^2.0+(cc.y-c.y)^2.0) If d<coldist Local mx#=(coldist-d)*(c.x-cc.x)/d Local my#=(coldist-d)*(c.y-cc.y)/d c.x=c.x+mx/2.0 c.y=c.y+my/2.0 cc.x=cc.x-mx/2.0 cc.y=cc.y-my/2.0 EndIf EndIf cc=cc._next Wend c=c._next Wend Return End Function For Local i=1 To 500 Local c:circle=New circle c.x=rnd(640) c.y=rnd(480) c.rad=rnd(160) clist_add c_list,c Next t=MilliSecs() circle_constraints t=MilliSecs()-t Print t |
| ||
Sweet as a nut - and yes that's pretty much how I'm using it. Thanks to everyone for the input. Nothing like a 'Max is slower than B+' post to get Mark's attention ; ) Merry xmas all. |
| ||
Here's another oddity. The ^2.0 test code is extremely slow. You can speed it up a little by turning Debug ON! |
| ||
Woah, that one needs investigating, Floyd |
| ||
Kanati said the debug version of my shader test was faster than release: I find it very odd that I get 1580fps average with a debug build. And 1200-1300 variable average on a non-debug build. http://www.blitzbasic.com/Community/posts.php?topic=41957 Tom |
| ||
Yeah, I didn't actually put it in the bugs forum because I didn't know if it was just me or not... but it looks like I might not be the only one. |