Whats so slow here? (Triangle fill routine)

Blitz3D Forums/Blitz3D Programming/Whats so slow here? (Triangle fill routine)

bytecode77(Posted 2007) [#1]
hi!

i made a triangle fill routine here. i've seen so many fill routines which were so fast... why is mine so slow? is there any visible mistake?

i mean - i've seen this sample below running on 100 fps, and now it's only 20 fps. where is the giant speed leak?
i know, everything i'm programing is slow :(...

thank you for any help!

texture:



_33(Posted 2007) [#2]
Well, if you'd do a load texture, you could load it in VRAM instead.


bytecode77(Posted 2007) [#3]
vram?
this is a software rendered polygon.

without textures it looks like that



without textures, it is just high-end performance.
the only difference to textures, is that i have to interpolate UV coordinates AND XY coordinates...


_33(Posted 2007) [#4]
Well, I would use the VRAM to do these operations, simply because VRAM is usually about 10X faster than regular RAM.


bytecode77(Posted 2007) [#5]
what? vram? but i dont have a Graphics3D mode on! this is 2d. how would <you> use vram in this example?


big10p(Posted 2007) [#6]
The code doesn't even run, here. I get array out of bounds error in debug mode. Also, I advise you specify the destination buffer with WritePixelFast, otherwise I don't think it'll work on my machine.


_33(Posted 2007) [#7]
Instead of using a Dim, work with a bank. Try avoiding to have code that looks like "reference (reference(reference))" in big loops. If possible take the reference and bring it to a simple reference% variable, such as the imagebuffer(img), etc. That alone should yeld some speed boost.

This piece of code will take a nice chunk of CPU:
For x = p1x To p2x
	u = vgliIntp(x, p1x, p2x, p1u#, p2u#)
	v = vgliIntp(x, p1x, p2x, p1v#, p2v#)
	WritePixelFast x, y, Tex(u, v)
Next

If you use a bank, it will be faster, for your u v map. Use peek to get the u v, and poke to initiate it in the beginning. Access the position of the u v data using a displacement value, calculated with (v * 16 + u) * 4. multiply by 4, to get the float value, as it will be 4 bytes.

disp% = (v * 16 + u) * 4
texture_RGBA% = PeekInt (uv_table%, disp%)


or, in your case, a more fancy code:
disp% = (vgliIntp(x, p1x, p2x, p1v#, p2v#) * 16 + vgliIntp(x, p1x, p2x, p1u#, p2u#) )  * 4
texture_RGBA% = PeekInt (uv_table%, disp%)


Note, that multiplying by 16 is actually for jumping to the next line of your texture. if the texture is 32 pixels wide, then this value should be 32, or a variable containing the texture width.

Try to have a version of vgliIntp that returns an Int value. Avoid divisions, as it is slower than multiplications. It is preferable to multiply by 0.1 than to divide by 10 for gaining speed in calculating.

*** NOTE: And, while I'm thinking about your case, why not just peek at the imagebuffer?

Cheers.


bytecode77(Posted 2007) [#8]
hi!

thanks for the fast reply!
i replaced it using banks,but it ain't faster. it is as slow as before.
:(


_33(Posted 2007) [#9]
But, usually these, types of functions are preferably done in assembly language.


ShadowTurtle(Posted 2007) [#10]
I think you must make a DLL for this work, Devils Child. *kotz*
I known this since mine last slowdown on my project :/


bytecode77(Posted 2007) [#11]


thank you for your help, i used shr and shl instead of floats and replaced speed critical functions with their content.

thanks for help :)


_33(Posted 2007) [#12]
So, is it faster? ;P

EDIT: tested on my gear. You passed from 20-30 FPS to 85-130 FPS when fully extended. Good!

It's in my belief that a good / tight / efficient programming scheme on any language can supplant a poor inefficient programming scheme in Assembly / machine language.

Oh, and you could also do this, but it becomes quite unreadable, for a gain of about 8 fps (on my system)
	WritePixelFast x, y, PeekFloat(Tex, (((p1v + (x - p1x) * iv) Shr 16) * 16 + (p1u + (x - p1x) * iu) Shr 16) * 4)


Cheers.


DJWoodgate(Posted 2007) [#13]
I was playing around with the array based approach and trying to minimise work in the writepixel loop. Not really working properly yet as far as indexing goes, I have increased the array size to stop it throwing hissy fits in debug mode. It gives a few extra fps anyway.




bytecode77(Posted 2007) [#14]
hey nice speedup, thanks a lot :)


_33(Posted 2007) [#15]
Yep, his bumps the speed up by a solid 20% from the previous version (considering same coordinates).