bottlenecks and optimization

BlitzMax Forums/BlitzMax Programming/bottlenecks and optimization

Nate the Great(Posted 2009) [#1]
Hi,

I was doing some tests of my physics engine on different machines and I just noticed something a bit odd. It seems as though my verlet engine has a bottle neck. Here is how I found out.(Look at statistics below for details) This is a few frames in a row with 900 objects and who knows how many constraints. But the constraints dont matter because the bottle neck is not in the constraint loop. Most of the time it is a steady 18 milliseconds on the update loop but every 50ish frames, I run into a bottleneck like this one where it jumps to the 20's for a few frames. This unfortunately, happens for all numbers of objects. I have posted my updateverlets function below. But since I am relatively new to blitz max, I have no idea what can cause bottle necks like this. Can anyone see a bottleneck? Perhaps I am blind but I dont see anything that could cause a potential bottleneck.

UpdateLoop 12
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 22
ConstraintLoop 2
UpdateLoop 25
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 19
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 19
ConstraintLoop 1
UpdateLoop 17
ConstraintLoop 2
UpdateLoop 19
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 19
ConstraintLoop 1
UpdateLoop 19
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 17
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 24      'bottle neck
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 17
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 20     'bottle neck
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 2
UpdateLoop 24     'bottle neck
ConstraintLoop 2
UpdateLoop 23     'bottle neck
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 17
ConstraintLoop 2
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1
UpdateLoop 18
ConstraintLoop 1



Here is the updateverlets() function

Function UpdateVerlets()
	
	
	For Local vcnt:Int = 0 To verlcount-1			'perhaps arrays create bottle necks?
		Local V:verlet = Verletarray:Verlet[vcnt]
		If V.active = True Then
			
			checkverletbounds v:verlet
			
			If v.x < activerightbound And v.x > activeleftbound And v.y > activeupbound And v.y < activelowbound Then
				
				
				V.done = True	'prevents checking against self
				
				
				V.collided = False		'Resets collision variable
				
				
				
				v.vx = (v.x - v.ox) * (Friction) 'Gets velocities of verlets
				v.vy = (v.y - v.oy) * (Friction)
				
				v.vx:*v.fricap
				v.vx:*v.fricap
				V.fricap = 1
				
				
				
				v.vx :+ Gravityx	'Keeps track of velocity and
				v.vy :+ v.vy + Gravityy	'Applies Gravity
				
				
				Local tmpox1:Float = v.x	' this is necessary for a feature I am experimenting with... just ignore for now.
				Local tmpoy1:Float = v.y
			
				v.x = v.x + (v.vx)
				v.y = v.y + (v.vy)
								
				v.ox = tmpox1
				v.oy = tmpoy1
				
				
				If Not v.ghost Then														'this block is the main suspect!
					For Local vcnt2:Int = 0 To verlcount-1
						Local VV:verlet = Verletarray:Verlet[vcnt2]
						If vv.done = False And vv.ghost = False Then								'keeps from causing 'ghost' verlets to collide with other verlets
							Local dx# = v.x - vv.x			'a little optimization
							Local dy# = v.y - vv.y
							If Abs(dx) < maxverlsiz And Abs(dy) < maxverlsiz Then					'are > and < slow?  IF they are then im in trouble
								If v.id <> vv.id   ' If Not the same verlet Or group
									
									Local dist# = Sqr( dx#*dx# + dy#*dy#)	'distance formula	
									Local totalr# = v.siz# + vv.siz#	'total radius
									If dist# < totalr# Then
										v.fricap = v.fricap - vv.friccr		'friction equations
										vv.fricap = vv.fricap - v.friccr
										
										Local totmass# = v.mass + vv.mass	'mass equations
										Local v1ofs# = (v.mass/totmass) * 2
										Local v2ofs# = (vv.mass/totmass) * 2
										
										Local dmtr# = dist#-totalr#		'dmtr = distance minus total radius
										Local Diffx# = ( dmtr# ) * ( dx# / dist# )	'more optimizatoin stuff.
										Local Diffy# = ( dmtr# ) * ( dy# / dist# )	'diffx = difference x
										
										If v.active = True Then	'if verlet is active then move it around
											v.x = v.x - Diffx# *v2ofs		'positioning the verlets
											v.y = v.y - Diffy# *v2ofs
										EndIf
										
										If vv.active = True Then	'if verlet is active then move it around.
											vv.x = vv.x + Diffx# *v1ofs
											vv.y = vv.y + Diffy# *v1ofs
										EndIf
										
									EndIf
								EndIf
							EndIf
						EndIf
					Next
				EndIf																	'end of suspect block
				
				
			EndIf
		EndIf
	Next
	
	For Local v2cnt:Int = 0 To verlcount - 1
		Local vtmp:verlet = verletarray:verlet[v2cnt]
		vtmp.done = False
	Next
End Function


sorry if its a lot to look through but I heavily commented the suspect blocks. Also notice, I removed the time step. I did this because I am trying to thread it and fix some timestep issues I had with threading.

My main questions are

are < and > slow in bmax? that would explain a lot if it is slow.
does accessing arrays thousands of times a frame cause bottlenecks?


klepto2(Posted 2009) [#2]
its just my guess:
try to eliminate all local varibles within the loop. Outsource them out of the loop.
For explanation:
if you have a local varable outside the loop the memory is allocated once.
Inside the loop the memory is allocated everytime the loop passes the declaracion. In the result the gc has to collect a few hundreds instead of 1 local varibles if it the gc does its work. And exactly this may cause the delay in this code.


Nate the Great(Posted 2009) [#3]
ok thanks klepto2!

I tried that and now it is much faster... There is still a bottle neck but it happens much less often. So am I missing something else or is blitz max just touching its boundaries on number of collision checks?


Nate the Great(Posted 2009) [#4]
after a little more research and experimenting, I found it was the little demo code and the drawing functions that were slowing things down. :) I took out the drawing functions and I could get 1100 objects no problem. If anyone else sees an optimization in the code above though, feel free to mention it because this is going to be a free physics engine (perhaps I will have a place for donations?) and every contribution you make, will benifit the community.


ImaginaryHuman(Posted 2009) [#5]
I didn't know that about locals, klepto, thanks.


slenkar(Posted 2009) [#6]
hey nate, what would you say were the top optimizations for code?


Nate the Great(Posted 2009) [#7]
Jeremy - Heres a small list.


Use arrays instead of Tlists

take locals out of nested loops!

a lot of math workarounds

Bounding Box Checks

More than one collision array

'activity' boxes

Random little optimizations I figured out through trial and error that I dont remember and If I did, they would fill up pages


slenkar(Posted 2009) [#8]
kthanks, I will have to convert to arrays if things get slow


Nate the Great(Posted 2009) [#9]
you are working on a physics engine too? Interesting... Do you have a name picked out? or a release date?


MGE(Posted 2009) [#10]
You are testing on low spec pc's too right?


slenkar(Posted 2009) [#11]
no not a physics engine, just something that uses loads of lists and nested locals, kind of like an RPG


Nate the Great(Posted 2009) [#12]
You are testing on low spec pc's too right?


Yeah. Its not as fast but I was still surprised because about a month ago I used to only be able to have 26 verlets on screen at a time till it slowed to a crawl on a 2.6 ghz dual core thinkpad :)

Does anyone have a pretty low spec pc? I cant seem to find one thats not too new but not too old. All of my machines are either faster than 2.4 ghz or from the windows 95 era.. I dont even know if they dreamed of what a gigahertz was like back then. Unfortunately I dont have any computers inbetween so I am having a hard time testing this on what most people consider low spec but not too old machines.


slenkar(Posted 2009) [#13]
Do you find it strange that my PC could only handle 6 but then it CAUGHT UP with the other PC's after a few optimizations? I thought it would lag behind.(not the dual core laptop of course)


Dreamora(Posted 2009) [#14]
Can test it in a few days on my EeePC 901 (Atom 1.6Ghz on XP SP3)
Bought that thing for exactly this kind of things, as it uses little space when not needed and is cheap


Nate the Great(Posted 2009) [#15]
Ok thanks dreamora!

@ Jeremy

after a few optimizations?


"a few" is an understatement! But yes I am surprised that your PC did so well... actually 6 boxes = 6*8 = 48 (I used 8 verlets per box in the first test but soon realized it was too inneficient) :) your pc was doing better than you thought but still that was a massive improvement.

edit: just tested it on an old xp machine of mine a little less than 4 years old (I think) and it did quite well ~850 verlets but I dont know how many ghz it is


slenkar(Posted 2009) [#16]
all depends if its a celeron or pentium really,
Cant wait for the release of this system!


Oddball(Posted 2009) [#17]
Just had a look at the code above and noticed one or two things. Nothing to do with the bottleneck issue as you seem to have that sorted, but you might still like to take a closer look.


v.vx = (v.x - v.ox) * (Friction) 'Gets velocities of verlets
v.vy = (v.y - v.oy) * (Friction)
Now I'm not sure how you are calculating the value of friction, but from these two lines it appears that you are applying friction to your 'verlets' every timestep. Friction should only be applied to objects that are in contact with other object. Object that are free from contact are also free from friction. Also you are multiplying velocity by friction. Surely once you have calculated the force of friction you should be subtracting it from the objects velocity?


v.vx:*v.fricap
v.vx:*v.fricap
V.fricap = 1
I have no idea what this piece of code does so I apologize if it is supposed to be like it is, but you appear to be multiplying vx by fricap then immediatly repeating the same equation. But like I say I don't know what it does so it might be correct.


v.vx :+ Gravityx	'Keeps track of velocity and
v.vy :+ v.vy + Gravityy	'Applies Gravity
At this point you appear to be doubling the 'verlets' y velocity as you also apply gravity. I think it's meant to be:
v.vy :+ GravityY
Anyway hope these few pointers help out.


Nate the Great(Posted 2009) [#18]
Thanks Oddball.

Did I mention I was thinking of making a Verlet Max for free. Then I can release a really nice version of it with tons of features and editors and I will sell that for around $5.

Now I'm not sure how you are calculating the value of friction, but from these two lines it appears that you are applying friction to your 'verlets' every timestep. Friction should only be applied to objects that are in contact with other object. Object that are free from contact are also free from friction. Also you are multiplying velocity by friction. Surely once you have calculated the force of friction you should be subtracting it from the objects velocity?



I multiply by friction because friction is a float between 0 and 1 1 being no friction and 0 being 100% friction. This simulates air friction so it is usually .995 or something smilar but it defaults to 1 so it has no effect unless you change it.

I have no idea what this piece of code does so I apologize if it is supposed to be like it is, but you appear to be multiplying vx by fricap then immediatly repeating the same equation. But like I say I don't know what it does so it might be correct.



oops.. you caught one there. Well the second one should be v.vy :) I stripped out a lot of unnecessary stuff in this function and changed some things to make it so people could understand and read it and I guess I messed some stuff up as well. Thanks for catching that one though

At this point you appear to be doubling the 'verlets' y velocity as you also apply gravity. I think it's meant to be:



again this is like the one above. I guess I messed it up upon taking stuff out because it is right in the actual source.

thanks

Also, I have been studying the way your engine works carefuly via the tutorials pack and I am making sure that we use different systems for special collisions and events etc. I dont want my engine to be a rip-off of yours.

and speaking of pointers, how do you pass function pointers through to another function and then call them? I cant figure this one out.

Thanks for all the help. If your engine is still evolving the site below is a must-read. I still haven't implemented or tested these optimizations but they are all in progress.

http://www.andrewnoske.com/student/downloads/jcu_hons/Thesis.doc


Oddball(Posted 2009) [#19]
Nate the Great wrote:
I multiply by friction because friction is a float between 0 and 1 1 being no friction and 0 being 100% friction. This simulates air friction so it is usually .995 or something smilar but it defaults to 1 so it has no effect unless you change it.
I'd be careful using similar terminology for both drag(fluid resistance) and surface friction. The two forces are calculated in very different ways.


Nate the Great(Posted 2009) [#20]
I'd be careful using similar terminology for both drag(fluid resistance) and surface friction. The two forces are calculated in very different ways.



hmmm... I have been calculating friction/drag similarly and it looks very realistic and accurate so far.


Oddball(Posted 2009) [#21]
Nate the Great wrote:
hmmm... I have been calculating friction/drag similarly and it looks very realistic and accurate so far.
Drag as a force is directly opposite to the direction of motion and is based primarily on velocity and the drag coefficient. Surface friction is opposite the direction of tangential motion when compared to the contact surface, and is base primarily on the perpendicular force and the friction coefficient. Also, drag coefficient is usually a result of an objects shape and friction coefficient is mainly based on an objects material.

The easiest way to see this effect in real life is to get two objects of identical shape and surface material but that weigh totally different amounts. Dropping both object will result in them both hitting the ground at the same time as they both produce identical drag. This is because their shape and velocities are identical at release. However, trying to push the two objects along the ground using the same force on each will result in the heavier of the two moving slower than the lighter object. This is because the heavier object has a higher perpendicular force resulting in a higher amount of friction force. I seem to have started rambling a bit, but I enjoy all this physics-y stuff too much. Anyway hope this helps try checking out some of the formulas for friction that litter the net. Coulomb's equation is usually the most popular.


Nate the Great(Posted 2009) [#22]
all this friction stuff makes my head hurt. I think friction in verlet max is prety much fine.. after all realtime game physics is about a balance of efficiency/speed and realism.. no realtime engine is 100% accurate at everything unless there is some advancement I am unaware of.


Oddball(Posted 2009) [#23]
Well it's your call, but you're gonna have some real troubles later on if you don't get it right now.


Nate the Great(Posted 2009) [#24]
Well it's your call, but you're gonna have some real troubles later on if you don't get it right now.



thats why im trying to modify all of my formulas to be more accurate now.