Optimizing MiniB3D's rendering pipeline

BlitzMax Forums/MiniB3D Module/Optimizing MiniB3D's rendering pipeline

jtfrench

(Posted 2011) [#1]

Hey guys,

I'm looking for a way to squeeze some extra performance out of MiniB3D. The OpenGL SuperBible talks about how if you use glMultiDrawElements instead of glDrawElements, you can render large amounts of triangles with fewer calls/OpenGL state changes. This sounded like a typical scalar-to-vector code optimization and I was excited to tweak MiniB3D to see if this would help. However, I got stumped when trying to support the case of VBOs being used with glMultiDrawElements.

The original code reads as follows (in TMesh.bmx):

	
			' draw tris
			
			glMatrixMode(GL_MODELVIEW)

			glPushMatrix()
	
			If TSprite(Self)=Null
				glMultMatrixf(mat.grid)
			Else
				glMultMatrixf(mat_sp.grid)
			EndIf
					
			If vbo																																																																																																																																																																																																																																																																																
				glDrawElements(GL_TRIANGLES,surf.no_tris*3,GL_UNSIGNED_SHORT,Null)
			Else
				glDrawElements(GL_TRIANGLES,surf.no_tris*3,GL_UNSIGNED_SHORT,surf.tris)
			EndIf

			glPopMatrix()

I'm not sure how to make the VBO-equivalent for glMultiDrawElements. I then read this on Apple's site ( http://lists.apple.com/archives/mac-opengl/2005/Dec/msg00039.html ) and began wondering if this was even worth it to begin with.

Does anyone have any experience with this?

Thanks,
Jason

Last edited 2011

jtfrench

(Posted 2011) [#2]

Just to get the juices flowing, here was my idea of how you could handle this:

1) For all rendering passes that would use VBO, combine the buffer data into a single VBO, and then make several calls to glDrawElements, and instead of just using NULL / 0 for the last parameter, you would increment it for the offsets of each of the sub-buffers.

2) For all rendering passes that wouldn't use VBO, you simply add each array that would have gone in the last parameter of glDrawElements to a new array, and then pass that to glMultiDrawElements.

I'm not really sure at all about this though. Any suggestions?

Kryzon

(Posted 2011) [#3]

I'm not sure if glMultiDrawElements does what you're looking for.

You see, if you add different meshes into the same VBO (say, several characters), all the polygons in that VBO will use the same texture even if they were originally textured with different ones.
Remember GL is a state machine, so you "activate" a texture, render whatever meshes you want to be textured by it (since it will always use the currently active texture), then activate the next texture and render the next meshes etc.

There is no way to render things with different textures without setting each texture up and rendering each mesh that uses it in separate passes.

The optimization you are looking for is in texture atlasing and single-surfacing whenever possible; MiniB3D will join everything into a single VBO internally. So a good effort for you would be to develop a robust single-surface system for particles, one for static meshes and one for skeleton-animated characters.
This would improve performance, since you'd be rendering what were all those different meshes with a single 'drawcall', something you really have to cut short.
If you use atlasing, things can even have "different" textures while on the same VBO.

jkrankie

(Posted 2011) [#4]

I'd echo Kryzon here. Texture atlasing or batching meshes would give a much more immediate speed boost.

Cheers
Charlie

jtfrench

(Posted 2011) [#5]

Thanks for the input guys. I'll look into this more. I'm not too familiar with how to pull these off, but maybe Google does. I'll have to read up on texture atlasing and how to enforce a single surface for meshes.

jtfrench

(Posted 2011) [#6]

Also, just to clarify ---- are you saying each animated mesh should be a single surface, each static mesh a single surface, etc or should ALL animated meshes be in a single surface, ALL static meshes in a single surface, etc.

Just want to make sure I'm not interpreting this the wrong way.

Kryzon

(Posted 2011) [#7]

Combine by class: the particles' single-surface object, the static meshes single-surface object etc.
You should keep as much stuff as you can in a single mesh. Thinking of it again, you shouldn't make animated meshes single-surface - it'll be too difficult.
Use it just for static meshes and particles.

The problem is, what if you try to store your entire game scene in a single mesh? the size of the atlas required for the entire scene would be enormous - no graphic card would be able to load it. So you have to compromise on the size of the collections. Keep atlas sizes down to like 1024².

Vertex shaders will also help a great deal with single-surfacing, as you can store each object's matrix as vertex attributes that are read in the vertex-shader, and it's there where you compute each vertex's position with that matrix to restore the original object's transformation.
This might be of interest: http://blitzbasic.com/Community/posts.php?topic=95084

jtfrench

(Posted 2011) [#8]

Texture atlasing is sounding really interesting but also introducing some challenges. I'm working on an MMO, so things like items in a scene, characters in a scene ---- we do not always know before hand what will be there. There are definitely assumptions we can make (e.g. the environment/scene architecture doesn't change, just the items and people in it).

What I'd like to know is:

• Does it make sense to use *multiple* texture atlases ---> e.g. one for scenery (wall/ground/ceiling textures), one for items, and one for characters

• Kryzon you suggested 1024 as an atlas limit. I think I remember MiniB3D pooping itself if I tried to load a texture image any bigger than 2048^2. So it sounds like we'd have to keep all atlases period within this range?

• Does using multiple atlases *hurt* performance? Is it better to either do one single atlas (+ individual textures) or no atlases at all?

Just want to get a better understanding of what the theoretical constraints are here.

Thanks,
jason

Kryzon

(Posted 2011) [#9]

1) It makes sense to use multiple atlases, because if you use a single one it would have to be huge. This also answers 2) with the following: the limit on texture size is given by the hardware, not MiniB3D.
Different cards support different "maximum texture sizes" (check THardwareInfo.BMX as it queries this constant among others); that's why I suggested a limit of 1024² because you can store a lot of stuff in it, doesn't cost that much memory and it's generally supported by a broad range of hardware (anything that doesn't support that size most likely wouldn't run your game anyway).
Take a look at this atlas from PSX's Crash Bandicoot 2. Ignore the noise.

3) Using multiple atlases [properly] is better than using individual textures for everything. EDIT: Of course there will be some situations where you need individual textures for an object, but you shouldn't worry about that as you are optimizing what you can.

A single texture sheet can hold several types of particles.
Each environment in your game can also have things clustered into sheets (without breaking the size limit). Some ceilings in a certain sheet, others in a second. The amount of atlases\sheets should be defined by necessity. Know that atlases can't hold tiling textures - there's no way to tile something from an atlas without breaking geometry and duplicating vertices - so ground textures and other tiling textures you should keep individual.
Include in your art pipeline some sort of script or plugin that bakes sheets and handles the necessary UV corrections - avoid doing it manually as that's a pain.

Further reading:
http://www.scriptspot.com/3ds-max/scripts/texture-atlas-generator (I'm assuming you use 3DS Max)
http://blog.wolfire.com/2010/03/Using-texture-atlases
http://www.texturebaking.com/features-benefits/

Last edited 2011

jtfrench

(Posted 2011) [#10]

Thanks for the clarifications, Kryzon. This makes sense (though I couldn't quite make out what was in that atlas you linked). I always forget about THardwareInfo.bmx but that sounds incredibly useful if it can poll the current graphics card and determine its limits.

It looks like NVidia also offers some Texture Atlasing tools here: http://developer.nvidia.com/legacy-texture-tools

As for my art pipeline, I'm currently using Maya + Ultimate Unwrap to get assets into the game.

Next I need to find out:

a) Assuming I have an appropriate texture atlas for the n models in my scene, where in the OpenGL rendering pipeline do I specify the sub-region coordinates for each texel

b) Assuming I've done a) correctly, is there any additional stuff I must do to let OpenGL know that "yes, these n meshes all use the same texture, don't bother issuing a state change for these textures", or is that handled implicitly by the fact that the n meshes will be rendered within a single glMultiDrawElements call?

Getting there, little by little :)

Kryzon

(Posted 2011) [#11]

Sure, lets keep it going.

A) These coordinates you can specify with offsets to the mesh's current UV values. Depending on the way your atlas is set up you can handle this like if it were an array.

In an atlas of particle textures, comprised of 4 x 4 textures, all square and of size 64² pixels, say I want to pick the last texture - that's the bottom right one.
Whenever you create a new particle quad, you would add to the UV coordinates of this quad the appropriate offset so it will reference the right texture from the atlas:

Local U#,V#
Local Row = 4
Local Column = 4
Local numRows = 4
Local numColumns

For [all vertices in this quad] {

	U = VertexU(surf,index) / numColumns
	V = VertexV(surf,index) / numRows

	'Add offsets so we go from texture-space to atlas-space:	
	VertexTexCoords( surf, index, U + (Column/numColumns), V + (Row/numRows) ) 
}

B) If all your geometry belongs to the same TMesh and this TMesh only has one surface, you can rest assured it'll be rendered with a single drawcall, without the need to add glMultiDrawElements or modify further code. You can look at the TMesh.Update() method's source: it goes through every surface of the TMesh, rendering it. If you have only 1 surface, that's just one drawcall.

Another element that can help you a lot is using shader programs. You can calculate a lot of custom stuff in a vertex shader for this and make it really shine.

Last edited 2011

jtfrench

(Posted 2011) [#12]

Thanks for the breakdown. A few questions:

• To pull of what you described in A), does that mean that you'd effectively have to write/change LoadAnimMesh such that the calls it makes to VertexTexCoords acknowledges the atlas coordinates, no?

• For B) what happens when the geometry is distributed amongst several different meshes, all of which reference the same texture atlas? Do you still get multiple draw calls? Is that when glMultiDraw would come into play?

Thanks

Kryzon

(Posted 2011) [#13]

1) You don't need to change code. You can simply add a routine that processes a mesh object you send to it.

2) In that case you would get several draw calls, since every mesh would need to run their own .Update() method and that costs 1 drawcall or more per method (depending on the number of surfaces).
If you do split your geometry along different VBOs then you're in trouble. You can't switch VBO during a drawcall.
You'd have to merge all the vertex data into a single VBO and then proceed to render it (therefore you don't need glMultiDraw; it doesn't do what we were thinking it does).
Read more here: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=277705

Since you're aiming for a MMO engine, maybe MiniB3D lacks the optimization you need for something of this scale (portals, Octrees, physics, streaming).
You should take a look at a MMO-capable engine such as Esenthel.
Also check GMan's port of Irrlicht Engine for BMax. Irrlicht is a tried-and-true engine with an active community.

Last edited 2011

jtfrench

(Posted 2011) [#14]

Since you're aiming for an MMO engine, maybe MiniB3D lacks the optimization you need for something of this scale (portals, Octrees, physics, streaming).

I figured out-of-the-box it may not support everything I need, but that it would be possible (with enough direction) to at least modify it and optimize it to meet my needs, since at the end of the day it's OpenGL, no? (not saying this would be trivial though!)

Additionally, I've been modifying the design of the game to ease up on the "massive" part of MMORPG, so that we're not literally rendering endless streaming worlds. What we have is more of a 3D MUD --- you walk from map to map (each map triggering its own scene load), and then include the meshes for whatever other players are in that room. By design we try to limit the amount of avatars in a scene significantly (always under 20, usually under 10 ---- however I am considering adding OpenGL instancing calls for select scenarios where I want to show crowds of NPCs.

So just to clarify, in 2) does that mean that there's no point in texture atlasing if your scene geometry is distributed amongst several meshes/surfaces ---- or does this still somehow come out better than individualized texture maps?

Thanks

Kryzon

(Posted 2011) [#15]

'Using the atlas' is half the requirement. The other half is having only 'a single-surface'. This way everything is really inside the same VBO, and everything will be rendered with a single drawcall.
Therefore, before rendering the frame, gather all the geometry data into the same surface or start working with just one surface to begin with.
If you're worried about the individual object control this could take away from you, look at single-surface systems in the code-archives because they show you how to handle objects individually even though their geometry is all mashed up into the same polygon-soup VBO.
Even though all your particles are in the same surface, you can still color, move and rotate each one separately.
For complex thing such as static objects, you need vertex-programs to be able to store not only vertex information but also their entity matrix, so you can apply individual object transformations to collections of polygons of that single-surface. This way you can use the 'Entity' commands with them so they behave like seperate objects - you'd need to modify the source and use shaders for this.

About the engine. Well, if you have the resources to improve MiniB3D, I'd say go for it. But if you're dealing with a deadlined production, try to have an engine or two under your belt so you can fallback to something in case it doesn't work! you only have to gain if you make your life easier :)

EDIT: Oh, and my pseudo-code on post #11 was wrong, but I corrected it now.

Last edited 2011

jtfrench

(Posted 2011) [#16]

Cool.

So here are two things from the archives that I'm finding interesting:

AddMeshToSurface by sswift:
http://blitzmax.com/codearcs/codearcs.php?code=575

Ultimate Single Surface Collection by Rob:
http://blitzmax.com/codearcs/codearcs.php?code=978
--->working link: http://www.box.net/shared/xlqiqyu4gg
(the code for the single surface entity system looks a lil sketch in some parts....)

I'm going to dig into these and see how I can push everything into a single surface but maintain ability to control, rotate, etc. Do you think I'm going in the right direction?

One blaring question that comes into mind is ---- what about 2D images (TImages) rendered with Max2D? I remember thinking before that I would have my 2D rendering optimized as long as I used a spritesheet(atlas) for my images, and then used DrawSubImageRect() to render the specific sub-region. However now I'm thinking that even that will still result in a draw call per object (or per DrawSubImageRect), which seems like (as you described above) is only half the battle. Am I correct in thinking this? It seems Rob's library above has a solution for 2D as well (it looks pretty good, a bit more solid than the 3D entity system to be honest). I

Also check GMan's port of Irrlicht Engine for BMax. Irrlicht is a tried-and-true engine with an active community.

Would the benefit be that these engines do single-surface rendering for you out of the box? Otherwise, wouldn't I still have to figure this out on that engine as well?

Lots to think about...

(and super thanks for continuing to go into the trenches with me :)

Last edited 2011

Last edited 2011

Kryzon

(Posted 2011) [#17]

One blaring question that comes into mind is ---- what about 2D images (TImages) rendered with Max2D?

Every call to DrawImage or DrawSubImage in Max2D runs this:

glBegin()
[...]
glEnd()

And that's a draw-call. If you want to perform batching (the optimization we've been talking in these posts - and a good keyword you can use to research material), you need to figure a way to draw more than one image's geometry at once. It's actually not that difficult, you just need to invest time designing the system - you'd need to use a vertex array for this.

Would the benefit be that these engines do single-surface rendering for you out of the box? Otherwise, wouldn't I still have to figure this out on that engine as well?

They most likely have these optimizations and plenty of others. They've been in development for a long time by people who are focused in improving them, so they have a lot of accumulated value.

Last edited 2011

jtfrench

(Posted 2011) [#18]

Irrlicht is a great engine that definitely has some hot stuff. Thing is, we already have a working game written completely in MiniB3D --- we're just trying to speed things up. I see that Gman provides a great BlitzMax mod for Irrlicht, and even a supposed B3D API mapping. In practice, do you know how successful this API port is? If it's the kind of thing that has been proven to easily map over and behave in the same way, then that would be pretty awesome. If not, switching engines/APIs at this point would set us back too much. I'll have to dig around to see what kind of success people have had with Irrlicht-B3D.

AdamRedwoods

(Posted 2011) [#19]

There are a few ways I found to optimize miniB3D, been a while, since I've been working with monkey more but:

- texture atlases! awesome technique, and is a MUST in mobile for all games.
Plenty of texture packers to be found, here's a new one that is quite efficient:
http://spritesheetpacker.codeplex.com/
Popular one:
http://www.texturepacker.com/

- when it comes to creating isometric landscapes, I created one large object from many smaller, repeatable pieces. Make sure the landscape is static, no moving parts. Coding-wise, what I did was to load an individual mesh asset, then CopyMesh() and AddMesh(). AddMesh() combines duplicate surfaces automagically, reducing overhead. Now this single object is not subject to camera culling, but reduces loop overhead and combined with a texture atlas should be pretty swift in the rendering.

levelmap:TMesh = CreateMesh()
levelmap.positionentity(0,0,0)

''....some level loading stuff here....

If map[x]  = 121 Then meshToAdd=Copymesh(wall_nernie, levelmap)
PaintSurfaceMesh(meshToAdd, 1, brush )
meshtoAdd.PositionMesh( (x)*mapsize,0,(-r)*mapsize)
meshtoAdd.AddMesh(levelmap)
meshtoAdd.PositionMesh( -(x)*mapsize,0,(r)*mapsize) ''reset the position of the original mesh to be used again

- you speak of being able to move parts (separate entities) from a single mesh, which is what i demonstrate here:
http://blitzmax.com/Community/posts.php?topic=94160

- the other two big improvements for miniB3d would be batching sprites together (providing they're in the same layer or z-depth) and camera culling using octrees. Culling right now in EntityInFrustrum() (TCamera.bmx) does a lot of calculations per frame PER OBJECT. So yeah, some sort of octree would help.

Other optimizations:
Light and shade your objects using textures, and use OpenGL flat-lighting.

Use mipmaps for distant scenery/ objects.

I made a bunch of optimizations to my version of miniB3d, but mostly I converted FOR-EACHIN-NEXT loops to use arrays instead. I did this in the main TMesh.bmx Update() and Alpha() methods since those were hit every frame.

Another small optimization I did for the Method EntityInFrustum#(ent:TEntity) is to unroll the for-next loop within it.

And lastly,under TCamera.bmx, the ExtractFrustrum() method instead of multiplying those two matricies I let opengl do it for me:

Rem
		' Combine the two matrices (multiply projection by modelview)
		clip[ 0] = modl[ 0] * proj[ 0] + modl[ 1] * proj[ 4] + modl[ 2] * proj[ 8] + modl[ 3] * proj[12]
		clip[ 1] = modl[ 0] * proj[ 1] + modl[ 1] * proj[ 5] + modl[ 2] * proj[ 9] + modl[ 3] * proj[13]
		clip[ 2] = modl[ 0] * proj[ 2] + modl[ 1] * proj[ 6] + modl[ 2] * proj[10] + modl[ 3] * proj[14]
		clip[ 3] = modl[ 0] * proj[ 3] + modl[ 1] * proj[ 7] + modl[ 2] * proj[11] + modl[ 3] * proj[15]
		
		clip[ 4] = modl[ 4] * proj[ 0] + modl[ 5] * proj[ 4] + modl[ 6] * proj[ 8] + modl[ 7] * proj[12]
		clip[ 5] = modl[ 4] * proj[ 1] + modl[ 5] * proj[ 5] + modl[ 6] * proj[ 9] + modl[ 7] * proj[13]
		clip[ 6] = modl[ 4] * proj[ 2] + modl[ 5] * proj[ 6] + modl[ 6] * proj[10] + modl[ 7] * proj[14]
		clip[ 7] = modl[ 4] * proj[ 3] + modl[ 5] * proj[ 7] + modl[ 6] * proj[11] + modl[ 7] * proj[15]
		
		clip[ 8] = modl[ 8] * proj[ 0] + modl[ 9] * proj[ 4] + modl[10] * proj[ 8] + modl[11] * proj[12]
		clip[ 9] = modl[ 8] * proj[ 1] + modl[ 9] * proj[ 5] + modl[10] * proj[ 9] + modl[11] * proj[13]
		clip[10] = modl[ 8] * proj[ 2] + modl[ 9] * proj[ 6] + modl[10] * proj[10] + modl[11] * proj[14]
		clip[11] = modl[ 8] * proj[ 3] + modl[ 9] * proj[ 7] + modl[10] * proj[11] + modl[11] * proj[15]
		
		clip[12] = modl[12] * proj[ 0] + modl[13] * proj[ 4] + modl[14] * proj[ 8] + modl[15] * proj[12]
		clip[13] = modl[12] * proj[ 1] + modl[13] * proj[ 5] + modl[14] * proj[ 9] + modl[15] * proj[13]
		clip[14] = modl[12] * proj[ 2] + modl[13] * proj[ 6] + modl[14] * proj[10] + modl[15] * proj[14]
		clip[15] = modl[12] * proj[ 3] + modl[13] * proj[ 7] + modl[14] * proj[11] + modl[15] * proj[15]
		EndRem
		
		''have opengl do the math for us
		glPushMatrix()

		glLoadMatrixf(proj)
		glMultMatrixf(modl)
		glGetFloatv(GL_MODELVIEW_MATRIX, clip)
	
		glPopMatrix()

I could go on and on, but honestly, miniB3D isn't too bad without mods. The best approach is to always try and "cheat the shot". In other words, how can the effect be accomplished by making it look right but is done using a cheap visual trick.

Last edited 2011

Kryzon

(Posted 2011) [#20]

Awesome stuff Adam, thanks!

[About GMan's Irrlicht B3D framework] In practice, do you know how successful this API port is?

It's not complete and it can't clone B3D up to command-level because both engines work differently; even though they have an "EntityFX" command, both work with different parameters etc. but provide a similar result.

However, I'm talking about the Blitz3D-like framework. The entire Irrlicht Engine is wrapped and complete, so it works without a problem - you can even continue his work on the framework so you have something that is more familiar to work with, and you also get to know how Irrlicht works by looking at the source. The benefit would be that Irrlicht is very complete, having octrees, optimized code etc.

jtfrench

(Posted 2011) [#21]

Thanks for the tips, Adam! I'm going to keep on digesting them.

Question:

AddMesh() combines duplicate surfaces automagically

By duplicate surfaces, does this mean anything that uses the same texture (atlas)? Like for example, if I have a static mesh for a couch in a scene, and I also have a static mesh of a chair in the scene, and both of their textures are on the same atlas. My guess is that their surfaces would still be considered distinct, right? Would AddMesh() automagically recognize that these are two peas from the same texture pod and work them into the same surface, or would there be additional work?

I guess I'm still uncertain about the surface-texture atlass relationship.

AdamRedwoods

(Posted 2011) [#22]

It should be considered the same surface since the UV values are stored in the VBO, so therefore minib3d should see identical surfaces.

When creating the chair and sofa, say you model and texture them independently using 128x128 textures, you would have to then create a larger image that would combine the two textures into a single 256x128 image (see programs above, but you could also do it by hand).

Then you would have to re-texture the models using the new 256x128 image and set your UVs in a modeling program before exporting the model for final minib3d importing.

jtfrench

(Posted 2011) [#23]

Then you would have to re-texture the models using the new 256x128 image and set your UVs in a modeling program before exporting the model for final minib3d importing.

Ahh, I think this was the missing link. Up to now, I've been thinking I'll have to write some tricky code to re-map the UVs at runtime. Having this stored at the model level makes a *lot* more sense and sounds a lot easier (at least for the programmer).

Back to sanity :)

Regarding http://blitzmax.com/Community/posts.php?topic=94160 (CreateSubMesh() ) isn't this only helpful if you have multiple surfaces within the combined mesh? And aren't we trying to have just one surface? I'm probably missing something there.

Last edited 2011

jtfrench

(Posted 2011) [#24]

Another thing I'd like to clarify:

Then you would have to re-texture the models using the new 256x128 image and set your UVs in a modeling program before exporting the model for final minib3d importing.

When I re-do the models such that they are UV'd to the same map, can I still export them as individual meshes, and then combine as described above in your post (#19), or would I also have to export one single joint mesh?

(I'm assuming its the former, but just to be sure...)

Thanks

AdamRedwoods

(Posted 2011) [#25]

Re: exporting individual meshes

Depends. Say you have a chair model and a sofa model that are in the 256x128 texture atlas. You can LoadBrush("texturefile") and use that brush to PaintMesh(chair) and PaintMesh(sofa).

If you're Loading models into BMax using a third-party loader, I am not sure if they will recognize the same texture, which is why to be safe, create your own brush and paint the meshes accordingly.

Re: CreateSubMesh()

Yes, good point, this only works if you have separate surfaces. It's more of a time saver for creating a whole bunch of assets and placing into a single model file. I used it for debris, which would disappear after a short while so the overhead was not permanent. I also used it to rotate and move various parts of robots, doors, but keep everything in the same OBJ file.

Last edited 2011

AdamRedwoods

(Posted 2011) [#26]

---MORE NOTES ON MINIB3D OPTIMIZATION---

It's tough with miniB3D since it doesn't batch models with same surface textures together automatically. This is where AddMesh() comes into play. You can have one MASSIVE model with various surfaces, and it will do the batching for you. But you lose object culling, so it's sending all these VBO's to your gfx card, which is overhead (10 million triangles = 30 million points * 2 uvs, you get the idea). That's why I say keep it static, because the VBO's should be sent ONCE, then the graphics card will keep that data in its memory and you don't have to worry about it if you're only moving the camera.

If you're smart with the AddMesh() you can break up massive terrain into chunks, so the frustrum culling takes care of chunks we don't see.

Now when I say static, that means no animating bones. There is a way (but not currently implemented in miniB3D) to batch moving, scaling, rotating objects together that share the same surface texture but different positions. This is what Charlie (jkrankie) did with miniB3D sprite batching.
http://blitzmax.com/Community/posts.php?topic=90420#1028446

Now to throw a wrench into all this, is semi-transparency. miniB3D organizes the drawing sequence for alpha-ed objects from back to front. So if you start using glass or semi-transparent surfaces or fading of objects, well, now things get complicated since it needs to be ordered.
Luckily, most of the ordered situations are taken care of, but once in a while you'll encounter weird problems, so you'd have to force objects to be in the ordering with the alpha-ed objects by setting the EntityFX(32), force alpha-blending flag.

So therefore, keep semi-transparent objects and textures separate from the massive, static landscape.

Link on OpenGL culling:
http://www.sjbaker.org/steve/omniv/frustcull.html

Link on texture/object batching in Unity, but good explanation:
http://unity3d.com/support/documentation/Manual/iphone-DrawCall-Batching.html

Last edited 2011

jtfrench

(Posted 2011) [#27]

Culling right now in EntityInFrustrum() (TCamera.bmx) does a lot of calculations per frame PER OBJECT. So yeah, some sort of octree would help.

Do you think the Octree code from MiniB3D v0.43 could be easily re-purposed for frustum culling (code pasted below), or did you find in practice that starting from scratch specifically for frustum culling was the way to go?

Type TOctree

	Field x#,y#,z# ' mid point of oct cube
	Field w#,h#,d#

	Field oct_child:TOctree[8]
	Field oct_polys:TPolygon[]
	
	' debug fields
	Global no_polys_mesh=0 ' no of polys before octree created
	Global no_polys_octree=0 ' no of polys after octree created
	Global no_polys_active=0 ' no of active polys
	Global no_cells=0 ' no of octree cells (leaf cells)
	Global no_cells_active=0 ' no of active octree cells (leaf cells)
	
	' MiniB3D functions
	
	' Internal - not recommended for general use
	
	' creates a mesh's octree
	Function CreateOctree(mesh:TMesh,max_polys,max_levels)
	
		' no minus figures
		If max_polys<0 Then max_polys=0
		If max_levels<0 Then max_levels=0
	
		' if max_polys>0 and max_levels>0 then octree is divided until each oct cube contains less than max_polys, or has been divided max_levels times, whatever comes first
		' if max_polys>0 and max_levels=0 then octree is divided until each oct cube contains less than max_polys
		' if max_polys=0 and max_levels>0 then octree is divided until each oct cube has been divided max_levels times
		' if max_polys=0 and max_levels=0 then error
		If max_polys=0 And max_levels=0 Then RuntimeError "Octree cannot be created - max_polys or max_levels must be greater than 0"
	
		Local polys:TPolygon[]=TPolygon.MeshToPolygons(mesh)
		
		mesh.octree:TOctree=New TOctree
		
		mesh.octree.w=mesh.MeshWidth()
		mesh.octree.h=mesh.MeshHeight()
		mesh.octree.d=mesh.MeshDepth()
		mesh.octree.x=mesh.min_x+mesh.octree.w/2.0
		mesh.octree.y=mesh.min_y+mesh.octree.h/2.0
		mesh.octree.z=mesh.min_z+mesh.octree.d/2.0

		mesh.octree.Create(polys,max_polys,max_levels)
		
		mesh.octree.no_polys_mesh=polys.length
	
	End Function
	
	' returns true if, from an array of polys, more than limit triangles are referenced
	Method TestTris(polys:TPolygon[],limit=50)

		' if limit=0 then return true immediately, we don't want to test tris
		If limit=0 Then Return True

		Local tris[limit]
		Local i=0
		
		For Local poly:TPolygon=EachIn polys
		
			Local tri=poly.tri
			
			Local ref_exists=False
			For Local check_ref=0 To i
				If tris[check_ref]=tri Then ref_exists=True
			Next
			
			If ref_exists=False
			
				tris[i]=tri
				i=i+1
				
			EndIf
			
			If i>=limit Then Exit
			
		Next

		If i>=limit Return True Else Return False

	End Method

	' returns, from an array of polys, no. of triangles referenced
	Method CountTris(polys:TPolygon[],limit=500)

		Local tris[limit]
		Local i=0
		
		For Local poly:TPolygon=EachIn polys
		
			Local tri=poly.tri
			
			Local ref_exists=False
			For Local check_ref=0 To i
				If tris[check_ref]=tri Then ref_exists=True
			Next
			
			If ref_exists=False
			
				tris[i]=tri
				i=i+1
				
			EndIf
			
			If i>=limit Then Exit
			
		Next

		Return i

	End Method

	Method Create(polys:TPolygon[],max_polys,max_levels,level=0)
	
		' if max_levels=0 then set max_levels to artificially high no. so that level<=max_levels (below) will always equal true
		If max_levels=0 Then max_levels=100000
	
		level=level+1

		If TestTris(polys,max_polys)=True And level<=max_levels

			Local child_polys:TPolygon[][8]
		
			Local polys1:TPolygon[]
			Local polys2:TPolygon[]
			Local polys11:TPolygon[]
			Local polys12:TPolygon[]
			Local polys21:TPolygon[]
			Local polys22:TPolygon[]
			Local origin:TVector=TVector.Create(x,y,z)
			Local normal_x:TVector=TVector.Create(-1,0,0)
			Local normal_y:TVector=TVector.Create(0,-1,0)
			Local normal_z:TVector=TVector.Create(0,0,-1)
			Local plane_x:TPlane=TPlane.CreatePlane(origin,normal_x)
			Local plane_y:TPlane=TPlane.CreatePlane(origin,normal_y)
			Local plane_z:TPlane=TPlane.CreatePlane(origin,normal_z)
			
			TPolygon.SplitPolygons2(polys,plane_x,polys1,polys2) ' left and right
			TPolygon.SplitPolygons2(polys1,plane_y,polys11,polys12) ' left bottom, left top
			TPolygon.SplitPolygons2(polys2,plane_y,polys21,polys22) ' right bottom, right top
		
			TPolygon.SplitPolygons2(polys11,plane_z,child_polys[0],child_polys[1]) ' left bottom front, left bottom back
			TPolygon.SplitPolygons2(polys12,plane_z,child_polys[2],child_polys[3]) ' left top front, left top back
			TPolygon.SplitPolygons2(polys21,plane_z,child_polys[4],child_polys[5]) ' right bottom front, right botttom back
			TPolygon.SplitPolygons2(polys22,plane_z,child_polys[6],child_polys[7]) ' right top front, right bottom back

			For Local i=0 To 7

				oct_child[i]=New TOctree
				
				oct_child[i].w=w/2.0
				oct_child[i].h=h/2.0
				oct_child[i].d=d/2.0
	
				Select i
				
					Case 0
					oct_child[i].x=x-(w/4.0)
					oct_child[i].y=y-(h/4.0)
					oct_child[i].z=z-(d/4.0)
					Case 1
					oct_child[i].x=x-(w/4.0)
					oct_child[i].y=y-(h/4.0)
					oct_child[i].z=z+(d/4.0)
					Case 2
					oct_child[i].x=x-(w/4.0)
					oct_child[i].y=y+(h/4.0)
					oct_child[i].z=z-(d/4.0)
					Case 3
					oct_child[i].x=x-(w/4.0)
					oct_child[i].y=y+(h/4.0)
					oct_child[i].z=z+(d/4.0)
					Case 4
					oct_child[i].x=x+(w/4.0)
					oct_child[i].y=y-(h/4.0)
					oct_child[i].z=z-(d/4.0)
					Case 5
					oct_child[i].x=x+(w/4.0)
					oct_child[i].y=y-(h/4.0)
					oct_child[i].z=z+(d/4.0)
					Case 6
					oct_child[i].x=x+(w/4.0)
					oct_child[i].y=y+(h/4.0)
					oct_child[i].z=z-(d/4.0)
					Case 7
					oct_child[i].x=x+(w/4.0)
					oct_child[i].y=y+(h/4.0)
					oct_child[i].z=z+(d/4.0)
				
				End Select
				
				oct_child[i].Create(child_polys[i],max_polys,max_levels,level)
					
			Next
				
		Else
		
			oct_polys=polys
			no_cells=no_cells+1
			no_polys_octree=no_polys_octree+polys.length
	
		EndIf
	
	End Method

	Method OctPick(a:TVector,b:TVector,mesh:TMesh,t_min# Var,t_min2# Var)
	
		Local xx#=x-w/2.0
		Local yy#=y-h/2.0
		Local zz#=z-d/2.0
		Local ww#=w
		Local hh#=h
		Local dd#=d
		
		Local t#
		Local p:TVector
		
		Local pick=TPick.IntersectSegmentAABB(a,b,xx#,yy#,zz#,ww#,hh#,dd#,t#,p)
			
		If pick
		
			If t#<t_min2#

				If oct_polys<>Null
				
					Local pick=TPick.PickPolys(a,b,oct_polys,mesh,t_min#)
					
					If pick=True
						t_min2#=t#
					EndIf
				
				Else
	
					For Local i=0 To 7
						If oct_child[i]<>Null
							oct_child[i].OctPick(a,b,mesh,t_min#,t_min2#)
						EndIf
					Next
					
				EndIf

			EndIf
			
		EndIf
			
	End Method
	
	Method OctCollision(collisionPackage:TCollisionPacket)

		'If no_cells_active>0 Then Return

		Local a:TBox=collisionPackage.bb
		
		Local b:TBox=New TBox
		b.mini[0]=x-w/2.0
		b.maxi[0]=x+w/2.0
		b.mini[1]=y-h/2.0
		b.maxi[1]=y+h/2.0
		b.mini[2]=z-d/2.0
		b.maxi[2]=z+d/2.0

		Local test=TBox.TBoxTBoxIntersection(a,b)
		
		If test

			If oct_polys<>Null
			
				no_polys_active=no_polys_active+oct_polys.length
				no_cells_active=no_cells_active+1
				'HighlightTris(oct_polys)
				TCollision.PolysCheckCollision(collisionPackage:TCollisionPacket,oct_polys)
				
			Else

				For Local i=0 To 7
					If oct_child[i]<>Null
						oct_child[i].OctCollision(collisionPackage:TCollisionPacket)
					EndIf
				Next
				
			EndIf
			
		EndIf
			
	End Method
	
	Method HighlightTris(polys:TPolygon[])
	
		For Local poly:TPolygon=EachIn oct_polys
	
			Local surf:TSurface=poly.surf
			Local tri=poly.tri
	
			Local v0=surf.TriangleVertex(tri,0)
			Local v1=surf.TriangleVertex(tri,1)
			Local v2=surf.TriangleVertex(tri,2)
	
			surf.VertexColor(v0,255,0,0)
			surf.VertexColor(v1,255,0,0)
			surf.VertexColor(v2,255,0,0)
	
		Next
	
	End Method
	
End Type

AdamRedwoods

(Posted 2011) [#28]

You could use that, but would have to change it to use an object's min/max xyz points instead of dividing up by the mesh's vertices.

http://stackoverflow.com/questions/431841/mesh-rendering-using-octree-algorithm
http://www.flipcode.com/archives/Octrees_For_Visibility.shtml

jtfrench

(Posted 2011) [#29]

In looking through options, implementing occlusion culling from scratch doesn't seem like the way to go, but using the OpenGL extension which handles it on the GPU (gl_arb_occlusion_query) seems like the way to go. It allows you to find out how many pixels on the screen would be affected by the rendering of the given primitive. I have yet to find any good tutorials on it, but a GPU solution seems like the way to go. Yes, there will be some concurrency issues in that the GPU and CPU operate asynchronously (e.g. all the occlusion queries will be based upon the previous -- not current -- frame), but as long as framerate stays high enough, these artifacts shouldn't be too big of a deal.

Has anyone checked this out yet?

Kryzon

(Posted 2011) [#30]

You should worry more about this when you have very expensive shaders - in this case, every pixel counts.
This occlusion query shouldn't be implemented "universally" in your engine as something that would run every single frame.
You should implement it per specific area. Say, a forest and other areas with lots of occluders that can benefit from these tests. Most open areas don't need this.

Also know that many graphic cards come with a feature called "Early-Z"; this does an occlusion test and discards fragments that are covered by others. This feature is not something you turn on, it's just 'there'.
If you manually output fragment depth in your pixel shader, it will be disabled (since the hardware needs to run the pixel shader to know what the depth will be).

If you're having doubts on what to implement, there is a better way.
Start developing your game concurrently with the game-engine, having 'necessity' dictate what should be implemented.
This way you are always on par with existing bottlenecks and whatever implementations are necessary to keep the target framerate, without risking bloating the engine with a list of optimizations you thought of beforehand.