uniform sampler2d 512by512image[ 100 ]

BlitzMax Forums/OpenGL Module/uniform sampler2d 512by512image[ 100 ]

beanage(Posted 2009) [#1]
uniform sampler2d RGBA8888image[ 100 ]

would this kill the performance of my fragment shader due to bandwidth limitations?


_JIM(Posted 2009) [#2]
Doesn't that mean 100 textures? Depending on the size, it might kill your performance indeed. If we're talking about 128x128 textures, thats:

32x 128x128 x 100 = 52 428 800 bits. That's about 6MB per frame. Not that much. If you're talking about 512x512 textures, then...

32x 512x512 x 100 = 838 860 800 bits. That's about... 100MB per frame. This will definitely kill your performance.

This is strictly bandwidth though.
I bet lookups on 100 textures will kill your performance as well.

Try to see if there's any workaround to use less textures.


beanage(Posted 2009) [#3]
thanks for the hint about it..

i need the information for an experimental raycast terrain renderer (in fact its sorta parallax mapping used in a very unusual way), where these textures are part of a heightmap quadtree..

ill try reducing them in number and size, thanx again!!

p.s.: i really didnt forget about our little project, just stuck in tuns of work for my exams... :/

(edit:) what dimension of bandwidth should i calculate with? (i now calc with 32 mb) .. plse consider this shader is supposed to be executed on an fs quad one time only (hope this plays a role), and hence its virtually the only render routine it may take about 10ms ..


_JIM(Posted 2009) [#4]
Well, it really depends on your target specs. I remember my card has something around 80GB/s bandwidth (which is not really enough if you fill its 1GB of VRAM and try to run above 80FPS :P )

If you have 6MB per frame for example, at 100 fps that would be 600MB/s. It's rather small. I think even Intel GMA has a lot more. Not sure about the numbers but it should be in the range of 3-5GB/s.

If it's 100MB we're talking about per frame, at 100 fps, that 10GB/s. It will kill Intel GMA for sure (that is if it actually cand do ANYTHING with 100 textures and still have 100 fps).

I would really worry about the bandwidth less and try to think of a smart way (sorry I can't provide any idea right now, my head is still spinning after lots of vector math at work :D ) to reduce texture number (either by having less textures with higher resolution, or by simply culling away some of them based on frustrum, distance, LOD, etc.).

100 textures really sounds a lot to me. I remember making a FPS game that had about 50 textures (used for 2 levels, effects, character, and 5 weapons).

Let's see... 100 textures means a terrain of 10x10 textures. Either you have a huge terrain, or you have a very detailed terrain. It seems to me you could definitely cull them to a 3x3 area (or 5x5, depending on your purpose) around the camera and use fog to fade the rest. Also, anything behind the camera (100% sure out of the frustrum) could be culled. That'll leave you with about 6 textures (in the 3x3 case) or 15 (in the 5x5 case). Much less than 100 :)

Plus, the raycast processing for 6 textures should be faster than 100 textures :D

I hope this wall of text is of any help :)


beanage(Posted 2009) [#5]
Oh ok.. you made me really going a little deeper into the field of bandwidth, and thanks to you now i know all about it :)

yes, the "wall of text" was of some help indeed ..

one last q.. how, particular, do texture lookups slow the frag.shader down? oh i am such a noob concerning shader programming, but i rellay need to learn all this .. :/


_JIM(Posted 2009) [#6]
Well, I'm not a shader guru, nor am I totally experienced. Hiowever, I wrote a bunch of shaders that were above the average complexity.

As far as I've read, texture lookups (the process of actually requiring a pixel from a texture) slow down the shader considerably if the shader is simple. However, it gets less significant when your shader goes past the 64-instruction-limit of SM 2.0.

I've never actually benchmarked stuff like this, nor was I really interested as my shaders were already fast enough for me :)


beanage(Posted 2009) [#7]
Thanks for the informations...

My shader will break the instr. limit for sure!

Mmmh, the raymarch-checking for ray/terrain intersection sure will need at least 100 texture samples... At the moment i'm afraid, this shader wont run on any card older than gf6800 -- pfff

(edit:) I want to introduce you my ideas a little bit more..
as i said, the basic idea is rendering terrain as a relief map on the frag.shader. As the fragment shader can hardly interpolate between heihgtmap pixels (could it?),
the nearest heightmap of the qtree needs to provide screen reolution.
Now, a raymarching is performed for each pixel. marching stepsize increases as the ray gets more distant from the camera. if the rays y-coord at a specific point is smaller than the heightmap value at this point, the fragment will be that point... quite easy, uh?


_JIM(Posted 2009) [#8]
I need to read that again slowly to make sure I understand everything.

However, wouldn't it be a lot easier to just render a square plane of [width] x [height] segments, then use a [width] x [height] texture on the vertex shader to adjust the terrain? As far as I know, this is the way most games do it lately because its very fast and quite cheap. However, it requires texture lookup of the vertex shader, which as far as I know is a 3.0 feature. It probably wouldn't bother you much since you already said you're going past 2.0 limit.

I've never tried implementing those, but as far as I've read here and there this is an excellent choice of technique for rendering terrain.

Also, instead of stressing out the pixel shader to render the displacement, you could use it to compute the detail of the terrain: splatting, specular highlights, normalmapping, etc. (even mapping based on surface angle)

Also, the vertex shader could compute tri-planar mapping for the terrain. This alters the UVs, so you don't have stretched textures.

If you're doing this purely experimental, then give me a bit of time to understand what you're aiming for and do a bit of research :-)

Even so, I'm pretty sure its going to be slower than the techinque I described.

One last thing: If you are targeting new hardware, bare in mind that it has unified shaders. Which means vertex and pixel shaders are just as fast, which in turn means you have to balance the computation stress over the 2 parts. I did my best to move everything I could into the vertex shader, and even so, it takes 2% of the frame time, while the pixel shader takes 98%. There's lots to talk about shaders but probably in another thread :)


beanage(Posted 2009) [#9]
you are right. concerning classic triangular geometry 3d, drawing terrain with help of the vert. shader and a vertexbuffered plane is the best choice by far..

however, this shader has been designed to figure the architecture of my very w.i.p. ib renderer, so a fragshader solution had to be developed.. i am excited to see if/how this idea works!!!!
also, using raymarching to render terrain is just one side of the medal: as the ray intersects an object from a objectlist uniformed to the shader, the pixel will be rasterized to be a part of that object..
to cut a long story short; this is an experimental image based rasterization shader.. lets see if this works!

ok, another q :) .. classical main loop architecture is:
- UpdateWorld() \\cpu work
- RenderWorld() \\gpu work
- Flip() \\synchronisize
Why not
- RenderWorld() \\get the gpu working..
- UpdateWorld() \\now cpu and gpu work parallel
- Flip() \\wait, until both are finished
?

at least, using my "rasterization shader" this might be possible, as the cpu now hasnt to do anything with rendering (theres no [for-each-visible-object] the cpu has to perform..)


_JIM(Posted 2009) [#10]
That should work. Hopefully there's nothing holding the CPU back while the GPU is working.

Though, this might be of no effect to you as you said CPU is doing almost nothing. This is very useful when CPU has a lot to process (heavy AI, pathfinding, etc.)

I'm sorry, but this whole rasterization thingy is slightly over my head and for the time being I don't really have enough time to spend into research. I really wish I could help you more :)


JoshK(Posted 2009) [#11]
The most your GPU will support is 32 texture units, so 100 is impossible.
Also, variable texture lookups like this will only work on GEForce 8+ hardware.


_JIM(Posted 2009) [#12]
Well, my graphics card (4870x2) seems to be able to take up to 80 texture units at once. Even so, more than 32 would probably be way too much, which is why I suggested ways to optimize the process and reduce it to something lower than 8.


beanage(Posted 2009) [#13]
Haha, found a solution.. :D well thanks for your hint, Leadwerks. Propably prevented me from running into a bug stealing me hours/days of precious dev time..

for finding the ray intersection, a binary search will be used; that might reduce tex lookups to something below 50..

also, using frustum clipping of the heighttree reduces height textures to 27..

hard at coding,

BeAnAge