bottlenecks and optimization
BlitzMax Forums/BlitzMax Programming/bottlenecks and optimization
| ||
Hi, I was doing some tests of my physics engine on different machines and I just noticed something a bit odd. It seems as though my verlet engine has a bottle neck. Here is how I found out.(Look at statistics below for details) This is a few frames in a row with 900 objects and who knows how many constraints. But the constraints dont matter because the bottle neck is not in the constraint loop. Most of the time it is a steady 18 milliseconds on the update loop but every 50ish frames, I run into a bottleneck like this one where it jumps to the 20's for a few frames. This unfortunately, happens for all numbers of objects. I have posted my updateverlets function below. But since I am relatively new to blitz max, I have no idea what can cause bottle necks like this. Can anyone see a bottleneck? Perhaps I am blind but I dont see anything that could cause a potential bottleneck. UpdateLoop 12 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 22 ConstraintLoop 2 UpdateLoop 25 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 19 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 19 ConstraintLoop 1 UpdateLoop 17 ConstraintLoop 2 UpdateLoop 19 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 19 ConstraintLoop 1 UpdateLoop 19 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 17 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 24 'bottle neck ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 17 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 20 'bottle neck ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 2 UpdateLoop 24 'bottle neck ConstraintLoop 2 UpdateLoop 23 'bottle neck ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 17 ConstraintLoop 2 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 UpdateLoop 18 ConstraintLoop 1 Here is the updateverlets() function Function UpdateVerlets() For Local vcnt:Int = 0 To verlcount-1 'perhaps arrays create bottle necks? Local V:verlet = Verletarray:Verlet[vcnt] If V.active = True Then checkverletbounds v:verlet If v.x < activerightbound And v.x > activeleftbound And v.y > activeupbound And v.y < activelowbound Then V.done = True 'prevents checking against self V.collided = False 'Resets collision variable v.vx = (v.x - v.ox) * (Friction) 'Gets velocities of verlets v.vy = (v.y - v.oy) * (Friction) v.vx:*v.fricap v.vx:*v.fricap V.fricap = 1 v.vx :+ Gravityx 'Keeps track of velocity and v.vy :+ v.vy + Gravityy 'Applies Gravity Local tmpox1:Float = v.x ' this is necessary for a feature I am experimenting with... just ignore for now. Local tmpoy1:Float = v.y v.x = v.x + (v.vx) v.y = v.y + (v.vy) v.ox = tmpox1 v.oy = tmpoy1 If Not v.ghost Then 'this block is the main suspect! For Local vcnt2:Int = 0 To verlcount-1 Local VV:verlet = Verletarray:Verlet[vcnt2] If vv.done = False And vv.ghost = False Then 'keeps from causing 'ghost' verlets to collide with other verlets Local dx# = v.x - vv.x 'a little optimization Local dy# = v.y - vv.y If Abs(dx) < maxverlsiz And Abs(dy) < maxverlsiz Then 'are > and < slow? IF they are then im in trouble If v.id <> vv.id ' If Not the same verlet Or group Local dist# = Sqr( dx#*dx# + dy#*dy#) 'distance formula Local totalr# = v.siz# + vv.siz# 'total radius If dist# < totalr# Then v.fricap = v.fricap - vv.friccr 'friction equations vv.fricap = vv.fricap - v.friccr Local totmass# = v.mass + vv.mass 'mass equations Local v1ofs# = (v.mass/totmass) * 2 Local v2ofs# = (vv.mass/totmass) * 2 Local dmtr# = dist#-totalr# 'dmtr = distance minus total radius Local Diffx# = ( dmtr# ) * ( dx# / dist# ) 'more optimizatoin stuff. Local Diffy# = ( dmtr# ) * ( dy# / dist# ) 'diffx = difference x If v.active = True Then 'if verlet is active then move it around v.x = v.x - Diffx# *v2ofs 'positioning the verlets v.y = v.y - Diffy# *v2ofs EndIf If vv.active = True Then 'if verlet is active then move it around. vv.x = vv.x + Diffx# *v1ofs vv.y = vv.y + Diffy# *v1ofs EndIf EndIf EndIf EndIf EndIf Next EndIf 'end of suspect block EndIf EndIf Next For Local v2cnt:Int = 0 To verlcount - 1 Local vtmp:verlet = verletarray:verlet[v2cnt] vtmp.done = False Next End Function sorry if its a lot to look through but I heavily commented the suspect blocks. Also notice, I removed the time step. I did this because I am trying to thread it and fix some timestep issues I had with threading. My main questions are are < and > slow in bmax? that would explain a lot if it is slow. does accessing arrays thousands of times a frame cause bottlenecks? |
| ||
its just my guess: try to eliminate all local varibles within the loop. Outsource them out of the loop. For explanation: if you have a local varable outside the loop the memory is allocated once. Inside the loop the memory is allocated everytime the loop passes the declaracion. In the result the gc has to collect a few hundreds instead of 1 local varibles if it the gc does its work. And exactly this may cause the delay in this code. |
| ||
ok thanks klepto2! I tried that and now it is much faster... There is still a bottle neck but it happens much less often. So am I missing something else or is blitz max just touching its boundaries on number of collision checks? |
| ||
after a little more research and experimenting, I found it was the little demo code and the drawing functions that were slowing things down. :) I took out the drawing functions and I could get 1100 objects no problem. If anyone else sees an optimization in the code above though, feel free to mention it because this is going to be a free physics engine (perhaps I will have a place for donations?) and every contribution you make, will benifit the community. |
| ||
I didn't know that about locals, klepto, thanks. |
| ||
hey nate, what would you say were the top optimizations for code? |
| ||
Jeremy - Heres a small list. Use arrays instead of Tlists take locals out of nested loops! a lot of math workarounds Bounding Box Checks More than one collision array 'activity' boxes Random little optimizations I figured out through trial and error that I dont remember and If I did, they would fill up pages |
| ||
kthanks, I will have to convert to arrays if things get slow |
| ||
you are working on a physics engine too? Interesting... Do you have a name picked out? or a release date? |
| ||
You are testing on low spec pc's too right? |
| ||
no not a physics engine, just something that uses loads of lists and nested locals, kind of like an RPG |
| ||
You are testing on low spec pc's too right? Yeah. Its not as fast but I was still surprised because about a month ago I used to only be able to have 26 verlets on screen at a time till it slowed to a crawl on a 2.6 ghz dual core thinkpad :) Does anyone have a pretty low spec pc? I cant seem to find one thats not too new but not too old. All of my machines are either faster than 2.4 ghz or from the windows 95 era.. I dont even know if they dreamed of what a gigahertz was like back then. Unfortunately I dont have any computers inbetween so I am having a hard time testing this on what most people consider low spec but not too old machines. |
| ||
Do you find it strange that my PC could only handle 6 but then it CAUGHT UP with the other PC's after a few optimizations? I thought it would lag behind.(not the dual core laptop of course) |
| ||
Can test it in a few days on my EeePC 901 (Atom 1.6Ghz on XP SP3) Bought that thing for exactly this kind of things, as it uses little space when not needed and is cheap |
| ||
Ok thanks dreamora! @ Jeremy after a few optimizations? "a few" is an understatement! But yes I am surprised that your PC did so well... actually 6 boxes = 6*8 = 48 (I used 8 verlets per box in the first test but soon realized it was too inneficient) :) your pc was doing better than you thought but still that was a massive improvement. edit: just tested it on an old xp machine of mine a little less than 4 years old (I think) and it did quite well ~850 verlets but I dont know how many ghz it is |
| ||
all depends if its a celeron or pentium really, Cant wait for the release of this system! |
| ||
Just had a look at the code above and noticed one or two things. Nothing to do with the bottleneck issue as you seem to have that sorted, but you might still like to take a closer look.v.vx = (v.x - v.ox) * (Friction) 'Gets velocities of verlets v.vy = (v.y - v.oy) * (Friction)Now I'm not sure how you are calculating the value of friction, but from these two lines it appears that you are applying friction to your 'verlets' every timestep. Friction should only be applied to objects that are in contact with other object. Object that are free from contact are also free from friction. Also you are multiplying velocity by friction. Surely once you have calculated the force of friction you should be subtracting it from the objects velocity? v.vx:*v.fricap v.vx:*v.fricap V.fricap = 1I have no idea what this piece of code does so I apologize if it is supposed to be like it is, but you appear to be multiplying vx by fricap then immediatly repeating the same equation. But like I say I don't know what it does so it might be correct. v.vx :+ Gravityx 'Keeps track of velocity and v.vy :+ v.vy + Gravityy 'Applies GravityAt this point you appear to be doubling the 'verlets' y velocity as you also apply gravity. I think it's meant to be: v.vy :+ GravityYAnyway hope these few pointers help out. |
| ||
Thanks Oddball. Did I mention I was thinking of making a Verlet Max for free. Then I can release a really nice version of it with tons of features and editors and I will sell that for around $5. Now I'm not sure how you are calculating the value of friction, but from these two lines it appears that you are applying friction to your 'verlets' every timestep. Friction should only be applied to objects that are in contact with other object. Object that are free from contact are also free from friction. Also you are multiplying velocity by friction. Surely once you have calculated the force of friction you should be subtracting it from the objects velocity? I multiply by friction because friction is a float between 0 and 1 1 being no friction and 0 being 100% friction. This simulates air friction so it is usually .995 or something smilar but it defaults to 1 so it has no effect unless you change it. I have no idea what this piece of code does so I apologize if it is supposed to be like it is, but you appear to be multiplying vx by fricap then immediatly repeating the same equation. But like I say I don't know what it does so it might be correct. oops.. you caught one there. Well the second one should be v.vy :) I stripped out a lot of unnecessary stuff in this function and changed some things to make it so people could understand and read it and I guess I messed some stuff up as well. Thanks for catching that one though At this point you appear to be doubling the 'verlets' y velocity as you also apply gravity. I think it's meant to be: again this is like the one above. I guess I messed it up upon taking stuff out because it is right in the actual source. thanks Also, I have been studying the way your engine works carefuly via the tutorials pack and I am making sure that we use different systems for special collisions and events etc. I dont want my engine to be a rip-off of yours. and speaking of pointers, how do you pass function pointers through to another function and then call them? I cant figure this one out. Thanks for all the help. If your engine is still evolving the site below is a must-read. I still haven't implemented or tested these optimizations but they are all in progress. http://www.andrewnoske.com/student/downloads/jcu_hons/Thesis.doc |
| ||
Nate the Great wrote: I multiply by friction because friction is a float between 0 and 1 1 being no friction and 0 being 100% friction. This simulates air friction so it is usually .995 or something smilar but it defaults to 1 so it has no effect unless you change it. I'd be careful using similar terminology for both drag(fluid resistance) and surface friction. The two forces are calculated in very different ways. |
| ||
I'd be careful using similar terminology for both drag(fluid resistance) and surface friction. The two forces are calculated in very different ways. hmmm... I have been calculating friction/drag similarly and it looks very realistic and accurate so far. |
| ||
Nate the Great wrote: hmmm... I have been calculating friction/drag similarly and it looks very realistic and accurate so far. Drag as a force is directly opposite to the direction of motion and is based primarily on velocity and the drag coefficient. Surface friction is opposite the direction of tangential motion when compared to the contact surface, and is base primarily on the perpendicular force and the friction coefficient. Also, drag coefficient is usually a result of an objects shape and friction coefficient is mainly based on an objects material.The easiest way to see this effect in real life is to get two objects of identical shape and surface material but that weigh totally different amounts. Dropping both object will result in them both hitting the ground at the same time as they both produce identical drag. This is because their shape and velocities are identical at release. However, trying to push the two objects along the ground using the same force on each will result in the heavier of the two moving slower than the lighter object. This is because the heavier object has a higher perpendicular force resulting in a higher amount of friction force. I seem to have started rambling a bit, but I enjoy all this physics-y stuff too much. Anyway hope this helps try checking out some of the formulas for friction that litter the net. Coulomb's equation is usually the most popular. |
| ||
all this friction stuff makes my head hurt. I think friction in verlet max is prety much fine.. after all realtime game physics is about a balance of efficiency/speed and realism.. no realtime engine is 100% accurate at everything unless there is some advancement I am unaware of. |
| ||
Well it's your call, but you're gonna have some real troubles later on if you don't get it right now. |
| ||
Well it's your call, but you're gonna have some real troubles later on if you don't get it right now. thats why im trying to modify all of my formulas to be more accurate now. |