Hey Mark!

Blitz3D Forums/Blitz3D Programming/Hey Mark!

sswift(Posted 2008) [#1]
Can you explain these results?

http://blitzmax.com/Community/posts.php?topic=76157


Dreamora(Posted 2008) [#2]
with low texture sizes, the GPU bandwidth doesn't become a bottleneck so you can do it in "realtime".
But the larger the texture becomes the more critical the bandwidth is. Get someone with a GMA900 / NVIDIA 6200 or joke card like that to test it ... should give more insight to that if it is really bandwidth related (as those cards are on 64bit bus and the like)

The thing that one should point out is that copyrect on different hardware does NOT work with 256 flag, so while it works for you, others might get black textures only.


Ross C(Posted 2008) [#3]
Well, i don't know if it would be the same with render to texture, as the texture demands you use the 256 flag as well.


Dreamora(Posted 2008) [#4]
r2t textures are hacked from outside anyway, Blitz does not offer that, as a result of it the texture must be modified outside.

but draw commands that are directed to texturebuffers do not work at all on some hardware. no drawimage / drawrect / drawtext. Only writepixelfast readpixelfast.


sswift(Posted 2008) [#5]
I SAID MARK, NOT DREAMORA!


with low texture sizes, the GPU bandwidth doesn't become a bottleneck so you can do it in "realtime".



Wrong!

While it may be true that one needs to worry about GPU bandwidth, that is not what is causing the results we're seeing. Copying one texture shouldn't cause an instant drop from 1000fps to 80, when copying 50 only drops the framerate by maybe another 2fps.


This is why I asked Mark to explain it. I'm pretty sure nobody here knows enough about how the cards work at a low enough level to explain why this is happening, and suggest a possible solution.


Dreamora(Posted 2008) [#6]
OK the initial drop is a different thing. that one actually is simple: To copyrect, you must lock the buffer so it gets downloaded to the System RAM. GPUs never were meant for that and are dead slow at it.

if you do it more than once it isn't needed anymore, the backbuffer is already present in the in the system ram and DX knows that it is here and where.
you can not grab anything from the VRAM directly, thats why Pixel Shaders were invented, to work on the GPU directly without using the ultra slow download solution anymore.


sswift(Posted 2008) [#7]
But this happens with both a copyrect and the render to rexture code which should be doing no transfers to system ram.

And one would think that doing a renderworld 50 times would trigger this slowdown 50 times.

But hey, Ross, try rotating that cube a bit inside that loop to prove whether it's reusing the same backbuffer that's already in system ram over and over. I really doubt it is though.


Ross C(Posted 2008) [#8]

Copying one texture shouldn't cause an instant drop from 1000fps to 80, when copying 50 only drops the framerate by maybe another 2fps.



Hmm, i don't think you have read the results properly.

My tests are ALL based on 50 copies. I don't do a test for just the one copy, so i have no idea what the framerate is for that. Pretty dam fast i'd imagine.

And you don't need to lock the buffer to copyrect. It doesn't for me give much of an improvement.


sswift(Posted 2008) [#9]
Ross:
I read the results right.

The results say that 50 256x256 textures takes the same amount of time as 50 32x32 textures. But 50 256x256 textures are equivalent to 3200 32x32 textures.

So if rendering 64x as many 32x32 textures didn't result in a drop in speed, then the meager change from 50 32x32 textures to 1 32x32 texture isn't going to affect the framerate either.

Clearly, something other than bandwidth is causing a hitch when the textures first start rendering that results in an initial precipitous drop in framerate.


Tab(Posted 2008) [#10]
With one copy is extremely fast in my GF7600GS.


Dreamora(Posted 2008) [#11]
Ross: Yes you don't use lockbuffer. But the hardware will do it to download the backbuffer content from VRAM to System RAM automatically.
thats why you dont gain / lose anything if you do it as well, its done anyway

but it puzzles me that it should be the same speed and checking the sources the reason was obvious: its a broken benchmark. the r2t has the setbuffer within the loops they must be outside for fair comparision. (resultet in a performance gain of 8%)

For me its hard to test as the visual output is not the same on my 8800GTS with 168.29 drivers.
The render to textures shows the spinning cube.
The copy rect only the backdrop


Ross C(Posted 2008) [#12]
I don't gain any performance when i take the buffers out. Must be something that's happening in the backround upon calling copyrect/render to texture.

You sound like you have a bad driver version? Mind you, they did say use the 256 flag with caution. I remember Lee Page's Tattoo had problems with the 256 flag, and he was using writepixelfast on his texture.


Dreamora(Posted 2008) [#13]
I have the most current drivers for the 8800GTS so I would guess that is one of those things that I pointed out -> forget copyrect and draw to buffer if you intend to have it played by real people, not theoretical ones ...
as this is a real machine with real drivers, there is little you can do about it ... its just DX7, point

without the alpha flag it works on the copyrect as expected at the same performance so Mark definitely should look into that. Abrexxes pointed that one out months ago and it still persists.