Help me optimise this!

BlitzMax Forums/BlitzMax Programming/Help me optimise this!

jkrankie(Posted 2012) [#1]
Hi,

I've been working on making a fast camera-aligned billboard sprite thing with OpenGL and minib3d. The target is to get as close to 100,000 scaled, rotated, blended quads with alpha rendering at 60fps as i can.

So far, i feel i'm getting pretty close to the target (100,000 at 27fps), but i think my maths brain is letting me down a little, thus i'm asking for some help!

I've got some relatively expensive stuff going on in the loop that generates the coords for the quad, and ideally i'd like to try and do away with some of it. The four points of the quad are calculated by getting the current modelview matrix, and getting an up and right vector. This bit is fairly quick i think, but to rotate the points i've been turning the matrix which is pretty slow due to sin, cos and atan2 calls, especially when it's being done 100,000 times per update.

What i *think* i should be able to do is rotate the points using the up and right vectors, or possibly the vectors for the quad points, but i've had no luck doing this so far. Rotating the vectors i think is probably possible with minimal sin/cos calls, but i can't work out how to do it. It would also be cool to be able to offset the point of rotation with and x and y handle similar to max2d.

I'm also not using vertex arrays, which i imagine would speed things up a little too, but likely not as much as getting rid of the matrix rotation.

So my questions are:
•how can i rotate the points without rotating the matrix
•how can i offset the point of rotation with an x/y handle similar to max2d
•how do these vertex array things work anyway???

I'll post the code below, you'll need a version of minib3d to run it as i'm using it's camera class (i use warner's version which has proper rotations, you can grab it from the zip link towards the bottom of this page http://www.blitzbasic.com/codearcs/codearcs.php?code=2498 )

here's the code, you'll need to supply your own texture. when it's running you can use the arrows to spin around, and the a and z keys to move back and forward. I've commented it so it's a bit more understandable!



Any help would be much appreciated! Even as it stands, it's waaaaaaaaay faster than rendering sprites with minib3d. Feel free to use it yourself in you own minib3d projects too!

Cheers
Charlie

Last edited 2012


AdamRedwoods(Posted 2012) [#2]
Did you see this yet?
http://blitzmax.com/Community/posts.php?topic=97172#1127059

you'd have to convert it over yourself from monkey. i intend do this soon, but haven't yet.

it uses minib3d surfaces, and each quad is added to a single surface, but maintains independent positioning, parenting, etc. therefore, it's only one glDrawElements() call for however many sprites you have.


As for your code above, since BlitzMax does not inline code, you would benefit most from that with vecSubtract, vecAdd,-- you should try to get the equation into one compact line for determining the quad vectors.

your questions:
1. rotating sprites: it'd be better with the matrix, easier to manage rotation if you are using a 3d camera. the way i approached the problem was to have a single point of origin, then draw the quad from that single point's matrix.

2. offset the sprite rotation origin: if sprites are treated as a single point, the quad is figured out by multiplying the corner points with the transform matrix. so you just move your corner points. example from the code from above link
			'p0 = mat_sp.TransformPoint(-1.0,-1.0,0.0)
			p0 = [-m00 + -m10 + o[0] , -m01 + -m11 + o[1], m02 + m12 - o[2]]		
			'p1 = mat_sp.TransformPoint(-1.0,1.0,0.0)		
			p1 = [-m00 + m10 + o[0] , -m01 + m11 + o[1], m02 - m12 - o[2]]	
			'p2 = mat_sp.TransformPoint(1.0,1.0,0.0)
			p2 = [m00 + m10 + o[0] , m01 + m11 + o[1], -m02 - m12 - o[2]]			
			'p3 = mat_sp.TransformPoint(1.0,-1.0,0.0)
			p3 = [m00 - m10 + o[0] , m01 - m11 + o[1], -m02 + m12 - o[2]]

i kept comments in there to show how i optimized from a matrix call to an inline calculation. but anyways, to offset the origin rotation, the quads corners are -1,-1 and 1,1, so the quad is centered. to offset to top left would be 0,0 and 2,2. to offset further top left would be 2,2 to 4,4. and so on

3. buffer arrays (assuming not vbo) are rather easy:
		glEnableClientState(GL_VERTEX_ARRAY)
		glVertexPointer(3, GL_FLOAT, 0, FloatPtrToYourVertexArray )
		
		glDrawArrays(GL_TRIANGLES, 0, vertices.Length()/3) 
		glDisableClientState(GL_VERTEX_ARRAY)

the trick is setting the vertex and texture array correctly, but with quads it's an array of 6 vertices * xyz ( [x,y,z, x,y,z, x,y,z, ..etc]) where z=0. but then you multiply your points through the matrix (optimized).


AdamRedwoods(Posted 2012) [#3]
more on buffer arrays and vbos:
http://www.songho.ca/opengl/gl_vertexarray.html


ImaginaryHuman(Posted 2012) [#4]
Why aren't you using glRotate() and glTranslate() instead of trying to make your own matrix for rotation?

Also what is the fill rate for your graphics card and are you getting close to it?


jkrankie(Posted 2012) [#5]
@adam, thanks. I've seen your code, but i've not ported it over to max. My experiments with single surface stuff in minib3d have not been anywhere near as quick as what i've got above. I'd be interested to know how fast yours is. Also, i guess i should say this isn't intended to be a replacement/optimisation for minib3d sprites.

I should probably qualify that i don't think that what i've written is slow per-se, just that i can't *quite* get the 100,000 quads with matrix rotation. If i turn the rotations off it runs 100,000 at 64fps, which is close, if i optimise with vertex+texcoord arrays i should get that number higher too leaving more room for the rotation calculation.

I'll definitely try generating the quad corner points via your method above, that should save a bit of time. I'm only using a single point to store the position. if i can offset things there too that's even better!

What kind of a speed boost would i likely get from using vertex arrays?


@ImaginaryHuman, i did a version using the build in matrix commands, but i couldn't get it going quick enough, and i had to generate and store the world positions of each of the quads corners so i could draw them all at once.

Don't know what the fill rate of my gfx card is (how would i find out????), but i'm getting close. as i mentioned above i can get 64fps with 100,000 scaled but unrotated quads.

Cheers
Charlie

Last edited 2012


AdamRedwoods(Posted 2012) [#6]
using vertex arrays and inlining functions:


EDIT: made a mistake, need to *4

19 fps with an intel g41 IGP for TOTAL=100000.
*** not in DEBUG mode

Last edited 2012


AdamRedwoods(Posted 2012) [#7]
You could even skip odd/even sprites every other frame, if your sprites are in the background and not essential for smooth updates.
local alter:int =0
alter = 1-alter
	'begin drawing quads
	'glBegin(GL_QUADS)
		'start looping! change the number here is you want more or less, don't worry about resizing the array above.
                For Local i:Int = alter To TOTAL-1 step 2

''....etc....


Do you need more optimizing? if so, I'd go opengl2.0 shaders.

P.S. interleaving draw arrays had no noticeable effect on my end

Last edited 2012


jkrankie(Posted 2012) [#8]
Oooh, very nice! thanks very much :)

It's quite a bit quicker actually! Nearly double the speed here, up from 27fps to 51fps.

if i comment out the matrix rotation bit it gets up to 105fps, so rotating the matrix is definitely the slow bit. if i comment out the atan2 call that gets the current matrix rotation at the top of the loop, it goes over 60, i.e the goal! Is there a cheap way of rotating the z rotation to 0degrees/radian0 so that i can skip the atan2 call i wonder?

Which bit would i need to add the x and y handle values to to offset the rotation? i still can't work it out! i assume its the points as they're being calculated, but i've not had much luck here.

Thanks again :)

Cheers
Charlie

Last edited 2012


AdamRedwoods(Posted 2012) [#9]
offset handles would be subtracted from the center:
center[0] = quadpositions[i].x - handlex
center[1] = quadpositions[i].y - handly
center[2] = quadpositions[i].z


as for speeding up atan2, cos, sin, i'd try lookup tables.


jkrankie(Posted 2012) [#10]
Ah, ok. I think i should have said offset the quad's center of rotation, does that make more sense? The code above moves the centerpoint, rather than offsets the point around which the corners of the quad rotate.

Cheers
Charlie


jkrankie(Posted 2012) [#11]
Here's an example from an earlier test that had the x/y handles working.



Cheers
Charlie


jkrankie(Posted 2012) [#12]
I worked out how i could eliminate the atan2 call in the end. I made a copy of the modelview matrix, and used the copy to do the z-axis rotation, then reset the rotation by copying the appropriate bits of the original matrix into the copy.

100,000 billboards at 66fps!

Cheers
Charlie