Optimization idea

BlitzMax Forums/BlitzMax Programming/Optimization idea

_JIM(Posted 2009) [#1]
I just thought about this:

If switching render states is expensive, then can't we minimize those by ordering the renders so that there's as few switches as possible?

Well, there's no way to easily do it, since if you want to have overlapping items, you have to draw them in a specific order.

However, isn't bmax acually ortographic 3d? And if it is, can the zordering not be affected to be able to sort items by blend mode?

The downside is that its not something that can be written inside the bmax engine, since this changes the way programs must be designed. However, I'm still interested if this can be done "manually".

As my game is supposed to both look good and run fast on Intel GMAs, I really want to squeeze anything I can out of it, and this kind of optimization might help others as well.

Cheers,
JIM


beanage(Posted 2009) [#2]
This is a really great idea.. I really had use for this in my fullscreen ui window system (dont ask)..

Maybe you could implement something like "drawing layers", or "zorder layers" .. default layer would be zero so normal bmax 2d functionality is maintained, then you call for e.g.
Type extRenderLayer
* extAddRenderLayer()
* extSetCurrentRenderLayer()
* extMoveRenderLayer()
* extRemoveRenderLayer()

also Flip() would only refer to the current layer .. the other layers stay untouched, and the whole thing gets drawn in the order specified by the layers zcoords..
mmh maybe this equally could form a simple implementation of FBOs, couldnt it?

[edit]: wouldnt this be easy to code in via a mod?


_JIM(Posted 2009) [#3]
Hmm... I was thinking about something else :)

Like, lets say you have a RED square, then over it a GREEN circle, and over it a RED triangle.

The old way you'd have:

SetColor(255,0,0)
DrawSquare()
SetColor(0,255,0)
DrawCircle()
SetColor(255,0,0)
DrawTriangle()


The new way would be:

SetColor(255,0,0)
DrawSquare(2)
DrawTriangle(0)
SetColor(0,255,0)
DrawCircle(1)


Your idea is similar, but not quite the same. Nevertheless, a good feature anyway.

What I'm affraid of though is that I'm going to have to wrap/rewrite most/all of the drawing functions. Also, I'm not sure if the rendering is ortographic or not.


N(Posted 2009) [#4]
You'd really have better luck writing a new Max2D driver from the ground up.


_JIM(Posted 2009) [#5]
Waking up this morning, I thought that this wouldn't work for overlapped objects with 8 bit alpha, as they would need to be drawn in a specific order.


You'd really have better luck writing a new Max2D driver from the ground up.



This is tempting, but I don't have the time right now. Also, I might need to run a few tests and see if the performance imrovement is worth the effort. If a new driver is the only way, then there's room for a whole lot of other changes that could speed up the rendering.


Fetze(Posted 2009) [#6]
Using Z-Ordering without ordering your non-Maskblend object draw calls by their z-value, you'll mess up blending and SetBLend will be useless.

If you intend to only draw Maskblended or Solidblended stuff, utilizing the z-Buffer is a great idea. But you'll have to sort by yourself as before as soon as you implement graphics that uses any other blendmode. Most particles for example.

Also, on modern graphic cards, I believe the state change overhead is nothing in matters of time usage - you'll most likely get a far higher performance increase if optimizing your code properly in matters of collision detection or culling.


beanage(Posted 2009) [#7]
@anotherjim: Aaah, better got your point now.. ok, my idea (or my interpretation of your idea) is different, but may end up with kinda the same features, and a render-to-texture implementation in addition..
anyone knowing bplus here? missing something like the setbuffer() functionality in max .. this could be it.
Mmmmh therefore the question occurs, if switching fbos is less expensive then switching renderstates like color, rotation, scale etc.. sure not. ok, in the context, my idea is useless :(.

Anyway, i think i will code a prototype of it, hence the features look too promising to simply forget about it \:

@fetze: .. an intel gma isnt much of a modern graphics card, isnt it :)


beanage(Posted 2009) [#8]
[edit:] mmmh, thinking about it, why not finally implementing an "objective graphics" sytem, performing the renderstate change minimazation automatically?

red = CreateColor( 255,0,0 )
green = CreateColor( 0,255,0 )
mytri = CreateTriangle( red,0 )
mycircle = CreateCircle( green,1 )
myrect = CreateRectangle( red,2 )

repeat{ DrawObjects() }

.. mind-blowing. But it would be really hard work in programming and dev, and i must admit i am too much of a EXT_gl_newb to code this.. any guru here with time and patience^^? mark? o0 he doesnt hear me :(


_JIM(Posted 2009) [#9]
@BeAnAge

I was thinking about that when Nillium said I should write a new driver.

Since BMax has a 3D engine, why not treat it as a 3D engine? :)
But this would change the way BMax is used almost entirely, as it would move towards B3D:

UpdateWorld()
RenderWorld()

Also, treatin it this way, lots of other stuff could be done... static geometry, culling, etc. All of this would be automatically handled by the renderer. That would be cool, but time-consuming :)

Also, I need to find an Intel GMA fast and test stuff on it, cuz its pretty hard to target lowend cards on this PC :D Changing renderstates has absolutely no impact (not even 1 in 600 fps) on the speed. But I need to see how VIA Chrome 9 and Intel GMA 950 handle those.


beanage(Posted 2009) [#10]
oh is see :D

glMax2DEngineDriver() .. hehe^^
it wouldnt just change how its used, it would be a COMPLETE REWRITE of all its graphics features!! Kinda blitzMax 2.0! And it would save us tons, tons, tons of work, hence we all actually have had to implement such an objective system into our apps ourselves, always and repetitive, hadnt we?

Couldnt this be an extension for the existing (none-used) max3d engine?

sry, talking outa my a**.


_JIM(Posted 2009) [#11]
Well, I am tempted to start such a project. But as I said, I don't have the time now (exams :P).

Also, it would probably require more than one person's effort.

But, my original question has partially been answered: renderstates are not that expensive (except probably for a particle system with each particle different from the other) and this optimization could be done with lots of effort and quite a few limitations :)


beanage(Posted 2009) [#12]
mmmh the funny thing is: i do my exams at the moment too.. here in germany we got the opportunity to do kinda "special learning performances" as replacement for a exam subject.. i already do a very experimental open source (maybe july) engine project featuring ibr,fbp;multiprocess,nn ai and some more interesting stuff .. maybe i could include something like our glMax2DDriver2..
but the solutuions would be restricted to the engine usage.. and the engine is far future in terms of industries time calculation .. just wanted to point out its not everything lost :)

btw, i did some testing on gl switching states perf myself when i began designing my ib renderer; results where you can draw approx. 10000 quads switching the color each quad using plain max2d .. maybe that helps you.. dont ask for the apps source, its lost anywhere on my hd and i cant find it :| .. results where taken on a 1.6 ghz cpu + 32 mb onboard gpu laptop..


_JIM(Posted 2009) [#13]
10000 quads on such a video card is very nice. My game will probably never hit 200 quads onscreen so switching renderstates is definately not the performance killer. :-)


ImaginaryHuman(Posted 2009) [#14]
I don't know that switching blend modes is a significant performance penalty, it is more widely known that switching to a differently bound texture is one of the more time consuming activities. If you can make sure you are drawing many images from a single texture, or minimizing the number of times you switch to a different image, that will help probably more than worrying about blend modes.


_JIM(Posted 2009) [#15]

I don't know that switching blend modes is a significant performance penalty, it is more widely known that switching to a differently bound texture is one of the more time consuming activities. If you can make sure you are drawing many images from a single texture, or minimizing the number of times you switch to a different image, that will help probably more than worrying about blend modes.



Usefull piece of information there. But I doubt I could get more than 5-6 texture changes out of my rendering loop.

I'd appreciate more of those "not-so-obvious" or rather "obvious after they are pointed out" ways to optimize rendering.

So far, I found out that drawing a 2x scaled 512x512 image is twice as fast as drawing an 1024x1024 image, however this improvement becomes drastically less significant when going from 512x512 to 256x256 or lower.


slenkar(Posted 2009) [#16]
what about doing a check to see if the next texture to be drawn is the same as the last texture?
also, would instancing of geometry help with fill-rate?


GW(Posted 2009) [#17]
If someone converted Jim's SpriteMaster Pro lib to Bmax there would be pretty massive speed up in drawing.


_JIM(Posted 2009) [#18]
Uh... you can do instancing of geometry in BMax?
I'm not sure I got my grips on the term, but I'm pretty sure it's not the same as drawing the same "TImage" at different positions. (which is what I'm doing right now)


GW(Posted 2009) [#19]
the SMPro lib isn't instancing as I know it, All the images are added to single single mesh, vertexes are use to control alpha and color for each image and zorder. but only a single mesh is drawn to the screen. When i used it for Blitz3d I was blown away by the speed. Jim did a great job on the whole library.


slenkar(Posted 2009) [#20]
instancing is opengl,


ImaginaryHuman(Posted 2009) [#21]
This `instancing` sounds like the same thing as keeping many images on one texture to avoid texture swaps. Instancing of geometry (a quad is geometry) just means drawing multiple instances of the same thing as quickly as possible. I am presuming that in OpenGL that would mean recording geometry into a display list and then calling the list many times. For quads that isn't going to speed things up much because the geometry is so simple. It's better for more complex 3d objects.

You can get about a 200% speed increase by using vertex arrays, however.

Also in order to switch to a new texture, you have to usually call glEnd to end the current geometry and then glBegin to start a new quad. So if you're doing that for every image drawn, that's not only a lot of texture switches but also a lot of begins and ends of geometry.