Benchmark of state changes 2d vs 3d hw

BlitzMax Forums/BlitzMax Programming/Benchmark of state changes 2d vs 3d hw

flying willy(Posted 2004) [#1]
Hi,

following code illustrates an important optimisation: that is, drawing your images and treating blitzmax like 2D will result in a *severe* slowdown.

Particularly with tilemaps or in games using lots of different images.

Grouping your drawing by image type should speed things up dramatically.

Can someone verify my findings?


'benchmarking state changes

graphics 640,480,0

'two *different* 512x512 pngs
i1=loadimage("i1.png")
i2=loadimage("i2.png")

'test 1 - drawing to the screen as if you were doing 2D
t1=millisecs()
for reps=1 to 1000
	drawimage i1,0,0
	drawimage i2,0,0
next
t2=millisecs()
print "default:"+(t2-t1)

	
'test 2 - drawing to the screen bearing in mind it's using 3D hardware...
t1=millisecs()
for reps=1 to 1000
	drawimage i1,0,0
next
for reps=1 to 1000
	drawimage i2,0,0
next
t2=millisecs()
print "grouped: "+(t2-t1)



TeaVirus(Posted 2004) [#2]
Looks like grouped is around 10 ms faster than default on each run. Interesting!


BlitzSupport(Posted 2004) [#3]
That looks to be true enough, and makes sense I guess. Good tip!


flying willy(Posted 2004) [#4]
I expect the problem to be a lot worser than 10ms! because that is the "best case"... Imagine if you had 30 different images and had to draw them in any order - for example alien attack waves or tile map displays.

I think it's definately something to work with in mind...


FlameDuck(Posted 2004) [#5]
Incidently, this isn't the case on my computer. Default gets anywhere between 300 and 2000, Grouped gets anywhere between 4 and 1700.

Default is usually faster.


flying willy(Posted 2004) [#6]
Higher number = slower, which means your pc is finding grouped faster as well.


Beaker(Posted 2004) [#7]
This is important. One of the biggest gripes that people have with Blitz3D is the way state changes are handled. It would be a shame if that mistake was made twice.


flying willy(Posted 2004) [#8]
Can someone confirm if LoadAnimImage splits each frame into a seperate texture? this would be disastrous in a real game situation...


Warren(Posted 2004) [#9]
Yeah, that's a real bummer. I was hoping that BlitzMax would be more intelligent than the old stuff in terms of grouping state changes and texture switches. *grumble*


kraft(Posted 2004) [#10]
Does this mean BlitzMax has an abnormal problem with speed in this type of situation, or are these results to be expected?

If it's a bug / issue, will it be remedied in future updates or is it too engrained in BM's design to fix?

*NO FLAME RESPONSES PLS, THIS IS A CIVILIZED QUERY*


flying willy(Posted 2004) [#11]
This is not a bug, and I personally expected these results. I am highlighting the issue so that we can work together to make improvements and/or a module for faster drawing.


Dreamora(Posted 2004) [#12]
removed by myself


flying willy(Posted 2004) [#13]
Dreamora: please do not troll, I do not welcome your unhelpful comments.


kraft(Posted 2004) [#14]
Why would we be using BlitzMax if we were capable of writing our own "hyper intelligent"-anything.

Hmmm, this thread just made me instantly think that the problem was with BlitzMax and the way it "communicates" with OpenGL? If, as you say Dreamora, the issue is cast in stone because it's an OpenGL (and even DirectX) problem then surely the only solution is to write your own brand new graphics library which controls the gfx hardware in some brand new revolutionary way - a way OpenGL and Microsoft's crack R&D teams have yet to discover?

In which case, this is probably not the best forum to disscuss it because I'm sure there are not many BM users with that level of technical knowledge able to offer any kind of solution, and - like me - will only end-up (mistakingly) believing it's some problem with BM's routines.


Robert(Posted 2004) [#15]
I don't think this is too much of a problem - all you have to remember is to render images in groups where possible.

Your code would probably do this in most cases anyway. For example, if you were designing an asteroids game, it would be normal practise to render the ship, the stars, and the asteroids in separate groups.


Beaker(Posted 2004) [#16]
Robert - no offence, but that's an over simplification of the issue. Your example works fine if the asteroids, stars and spaceships only have one frame of animation. And, maybe that is fine if an animimage isn't split into seperate images on load (this has still to be answered).

But, also there are implications for this when it comes to the future 3D module.

It is definitely worth raising these concerns now rather than later.


{cYan|de}(Posted 2004) [#17]
this is basic open gl stuff, its a state machine, try and cut down on the amount of command calls etc.


FlameDuck(Posted 2004) [#18]
Higher number = slower, which means your pc is finding grouped faster as well.
No it isn't. I'm perfectly capable of understanding that if something takes more milliseconds it's executing slower, thank you very much.

Did I not just explain it to you properly? Yes, Grouped will occasionally get the lower score, but default is *consistently* faster. Out of 30 tests, the grouped mode was faster (a lower score) 4 times, the other 26 times, the default rendering was faster, generally by at least a factor 3.

Doesn't the fact that it takes somewhere beetween 4 and 2000 milliseconds tell you something?


BODYPRINT(Posted 2004) [#19]
OK, there is a flaw in your test.

If you perform the default test before the grouped it always returns a lower number than grouped.

So the first test is always dissadvantaged.


flying willy(Posted 2004) [#20]
Strange bug. Can you help fix it?

I have split it into two files and it still reports grouped being faster.

Do you have any idea why it is bugged? A possible Blitzmax bug.


Dreamora(Posted 2004) [#21]
Could as well be a bad opengl driver which is common for non-nvidia-ati cards or one of the drivers from those 2 companies that had problems with their opengl implementation.


Nobody(Posted 2004) Edit [#22]
This is a far better benchmark. It should take under a minute to run (my GF2 MX runs it in about 70 seconds, but most of you will have a more powerful card).

'benchmarking state changes

graphics 640,480
bglSetSwapInterval( 0 )

SetColor 255, 255, 255

'two *different* 512x512 pngs
i1:TImage=loadimage("i1.png")
i2:TImage=loadimage("i2.png")

DrawImage i1, 0, 0
DrawImage i2, 0, 0

Flip
Cls

t1=millisecs()
	
For loop = 1 To 100
'test 2 - drawing to the screen bearing in mind it's using 3D hardware...


for reps=1 to 100
	drawimage i1,0,0
Next

for reps=1 to 100
	drawimage i2,0,0
Next

Flip
Cls	

Next

t2=millisecs()
print "grouped: "+(t2-t1)


t1=millisecs()
	
For loop = 1 To 100


'test 1 - drawing to the screen as if you were doing 2D

for reps=1 to 100
	drawimage i1,0,0
	drawimage i2,0,0
Next

Flip
Cls

Next

t2=millisecs()

print "default:"+(t2-t1)

print

End

flying willy(Posted 2004) [#23]
Actually that benchmark makes absolutely no sense.


Michael Reitzenstein(Posted 2004) [#24]
Why?


flying willy(Posted 2004) [#25]
My mistake, you explained it to me and I now understand that flip actually causes the card to begin work.

Your revised test also proves my point still. Thank you for the contribution.


Dreamora(Posted 2004) [#26]
flip doesn't do anything else than glswapbuffer -> swap front and backbuffer. ( you might check the sources yourself. I crawled them last night in hope to find out how drawimage works ... nothing so far but I create a C function that gives me the hwnd for some other winapi stuff like waitevent etc ;) )

the draw is the actual work where the whole thing is drawn into the backbuffer


flying willy(Posted 2004) [#27]
As usual, you're talking nonsense and I must again request that you keep your opinion to yourself.

Flip causes the graphics card to begin rendering on a seperate thread. This was explained to me in depth. As you clearly do not know what you're talking about I suggest you refrain from posting at all, and get on with your game.

This isn't a personal attack, but you need to take a reality check.


Jim Teeuwen(Posted 2004) [#28]
this should be interesting.
Hunter's test shows the following:

grouped: 16996
default:16913

So it shows that grouped drawing is actually slower. whereas the first benchmark shows the other way around.
I think we need a proper benchmark first.

Intrerestingly, doing Hunter's test without Debugmode shows this:
grouped: 16863
default:16913

Grouped is faster here.

---
next edit:
I tried hunters code with 4 images. This time all different images and different sizes. (3 of em not even power of 2.
the results:
grouped: 14749
default:14657

This turns out waay faster than the 2 image test


Dreamora(Posted 2004) [#29]
skunk:
glmax2d.bmx method flip:
sync interval check + bglswapbuffers

bglswapbuffers:
comes from blitzgl.bmx which simply does a glswapbuffers with the context hdc

so next time you attack me you better check out first. No idea what drugs you take but take less you overagressive .....


flying willy(Posted 2004) [#30]
The new test shows:

grouped: 1998
default:2017

Grouped still has the edge but the slowdown is now much less. How about a more varied test, anyone? something more real-worldy?


Michael Reitzenstein(Posted 2004) [#31]
Dreamora - DrawImage doesn't block program execution. The graphics card begins rendering, but DOES NOT necessarily finish. On Flip, if the scene isn't finished rendering, the program stalls until it is. The exact details are driver dependent, but this is basically how it happens.

We're discussing an important issue here. If you want to try and sound smarter than you actually are, there's a conveniently placed beginners forum at the start of the active topics list.


ImaginaryHuman(Posted 2004) [#32]
I thought that most of what BlitzMax does in OpenGl is in immediate mode where commands are more or less executed right away, not when you do a Flip.


Dreamora(Posted 2004) [#33]
Micheal: erm where did I actually say something like that? Skunk wrote: " My mistake, you explained it to me and I now understand that flip actually causes the card to begin work" not me ...

I just wrote that flip doesn't do more than a glSwapBuffers and nothing more than that.


Michael Reitzenstein(Posted 2004) [#34]
Generally (some drivers may do wacky stuff) rendering begins after the first glEnd, but the card renders asynchronously with your program. Nothing is guaranteed to be rendered until glFinish (or glFlush) is called, and this is called implicitly by functions such as glReadPixels or (drum roll) your buffer swapping function.

To quote Dreamora,

flip doesn't do anything else than glswapbuffer

Ignoring that there isn't actually such a thing as 'glswapbuffers', this is exactly what flip is doing - it's calling the buffer swapping function, and glFinish is called.


Michael Reitzenstein(Posted 2004) [#35]
Dreamora - what does this mean?

the draw is the actual work where the whole thing is drawn into the backbuffer



Dreamora(Posted 2004) [#36]
I only meant that the actual work for the GPU is the draw and not the flip but might have written it in a non-understandable way as I am not native english and my sentence construnctions might lead to a wrong "message"

won't post anymore on this kind of topics for the best of all