Render to texture - Speed test

Blitz3D Forums/Blitz3D Programming/Render to texture - Speed test

Ross C(Posted 2008) [#1]
Ok, i have conducted a fairly, erm, fair test of the speed difference between copyrect + 256 (force texture to VRAM flag) and the Render To Texture function found within the fastlibs library.

My results are all pretty conclusive at a number of different iterations of copyrect and rendertotexture calls per loop.

I found that render to texture is faster by up to 50%. This was conducted using a 512 x 512 texture. The cameraviewport was resized to the texture size in both tests, to ensure the same dimension of pixels was being copied.

However, when switching to smaller sized textures, 256 128 64, the difference was practically not noticable. I'm talking 1 fps over 50 copyrects.

So, for the results. My scene in 20 sphere's, with segment level of 3. A cube to texture my screen renders on. And a texture for the sphere. My rig handles this no problem, at 1700 fps+.



Code used for this test.



I've hardcoded the resolution. This won't run unless you have the fast lib.dll and include file, which i can't give you, because they are copyrighted :o)

If there is anything i have missed i will try and address them.

To some up, copyrect, on my rig anyway, only really pulls up the performance if the texture being rendered to, is over a certain size, in this case, anything over 256x256. It must be to do with the behind the scenes work of copyrect and rendertotexture taking longer than the actual copy itself.


Beaker(Posted 2008) [#2]
Wierdly, I render2texture is slower on my laptop than copyrect!


Ross C(Posted 2008) [#3]
What sizes of texture are you using? Some pretty strange results. After all the harping and moaning, it really isn't THAT much faster than copyrect/256 flag.

The lib i use, fastlib, you MUST use the 256 flag with the texture, so copyrect must be dam fast, that's all i can say :o)

On heavy scenes, it's the two renders that really kill the speed anyhow i think.


sswift(Posted 2008) [#4]
The shadow system would only cause the characters casting the shadows to be rendered twice, not the whole scene.


Ross C(Posted 2008) [#5]
What resolution of texture do you use swift, for your render of the screen?


sswift(Posted 2008) [#6]
Each object's shadow is rendered seperately, and the user is allowed to specify what the reolution is on a per object basis. I don't recommend over 256 for shadow texture resolution.

Something about your results strikes me as odd though. The fact that you get 1700 fps for no shadow texturing and then everything drops to around 80fps the minute you do any, no matter which method you use to render them. That smacks of something causing a graphics card hitch.

Have you tried looking at SetCameraViewport to see if that's what's doing it? You set it every time you render the texture. Have you tried setting it once? Also, have you tried setting it to the texture size, and then back to normal in the loop, so you can be sure it is being triggered even when it doesn't really need to change?

Also, why reset rendering to the backbuffer after every texture in the second example? You would only need to do that after you're done rendering textures. I don't know if setting the buffer causes a hitch, but that's something else to investigate.


Ross C(Posted 2008) [#7]
That's a good point sswift. Don't know why i left that in there... . Mind i'm doing 50 of these per frame too :o) I'll quickly rerun them tests.


Ross C(Posted 2008) [#8]
Ok, just quickly reran the tests. Not really any difference in speeds. The speed hit partly must come from whatever setup work needs done when calling copyrect and the rendertotexture function within fastlib.


Dreamora(Posted 2008) [#9]
Same speed on both but the visual output is broken on the copyrect.

The r2t shows the spinning cube (150 FPS after fixing the points mentioned below)
the copyrect doesn't show it. (135 FPS)

Core 2 Duo E6600, 2GB RAM, GF 8800GTS 640MB on 169.29 drivers.

Both share the basic fact that the spheres are plain white.

Notes:
1. the r2t render function is broken. it sets the buffer within the loop instead of outside. as locking is a very costy operation it costs a fair percentage of the true r2t performace and actually degrades it to copyrect as it does the same operations -> defeats its point :)
2. r2t does not need camera viewport. it already sets the texture as render target which automatically defines the area. it doesn't make a difference on my card thought

EDIT: At ts = 1024 I at least get parts of the cube on copy rect (but it has rect shaped holes). The performance on the r2t is still the same, the one on copyrect degrades to 80, same for 2048


Ross C(Posted 2008) [#10]
Over here, the render to texture takes the screen and squashes it to a square texture, whereas the copyrect gievs what i'd say is the correct appearance. That's why i'm setting the cameraviewport on that.

My mistake on setting the buffers inside the loop. I took them out earlier and it makes no difference here.

I'm using the fastlib rendertotexture function. What one are you using?


Dreamora(Posted 2008) [#11]
FastExtends most current release
Perhaps your card just isn't that powerfull at shader 2 tech level stuff ;-) (given that I have 3 times your performance, that would be possible, especially as my driver settings are on quality to high quality, not performance)


puki(Posted 2008) [#12]
My 8800GTX would probably eat this.


Dreamora(Posted 2008) [#13]
not really. The 8000 is crap at fixed pipeline and DX7 is pure fixed ... a 7900GT would kick both of us ;-)


Dreamora(Posted 2008) [#14]
Just tested it on my tablet: (x41t with Intel GMA 900)
moved the setviewport and setbuffer outside both loops

On Intel GMA 900 both work
512:
r2t 32 FPS
Copyrect 8-10 FPS

256:
r2t: 65 FPS
copyrect: 15-17 FPS

Shows, that copyrect only on highly optimized cards seem to be a replacement. On lower opted and onboard the statistics show up right again in the way we assume it to show up


Ross C(Posted 2008) [#15]
So render to texture is far better on lower spec machines? I might test this on my laptop too.


Dreamora(Posted 2008) [#16]
Yes. On new (pure shader) hardware it makes no difference anyway.
Anything you do is translated to shaders, so the copyrect internally could be translated to the same as r2t or at least very similar gpu code