Less multiplications per quad possible?

Monkey Forums/Monkey Programming/Less multiplications per quad possible?

nori(Posted 2014) [#1]
I've (for fun) converted Obj-C code from Monkey to Swift and while doing that I was (not because of Swift) able to reduce the number of float multiplications (for the tformed==true case) from 18 to 12.

That code snippet sadly is in Swift, isn't very readable, omits the if tformed (it always does the tformed block, maybe that makes it more compact), puts a few premultiplied more into the Surface class, and might even break the transform if I'm not good at testing. But, I believe, actually I did not leave anything out.

Anyway, it looks like this, and I thought maybe the mojo routines have still optimization potential:

was useless swift code


PS: I realize I might have thrown away uscale and vscale code in some way, dunno.


nori(Posted 2014) [#2]
ok, turns out, all there is possible is changing the tformed block to do 2 more stores and 4 multiplications less :I. and it even won't really matter with current hardware maybe. everything else was due to me using npot textures and whatever

	if( tformed ){
		float xw = x1;
		float yh = y3;

		float xix  = x  * ix;
		float xwix = xw * ix;

		x0 = xix  + y  * jx + tx;
		x1 = xwix + y  * jx + tx;
		x2 = xwix + yh * jx + tx;
		x3 = xix  + yh * jx + tx;

		float xiy  = x  * iy;
		float xwiy = xw * iy;

		y0 = xiy  + y  * jy + ty;
		y1 = xwiy + y  * jy + ty;
		y2 = xwiy + yh * jy + ty;
		y3 = xiy  + yh * jy + ty;
	}



nori(Posted 2014) [#3]
some faster functions on the monkey side, since trans apparently doesn't optimize the * 0 and * 1 and the compilers probably won't either

(untested)

Function Translate( x#,y# )
	context.tx += x * context.ix + y * context.jx
	context.ty += x * context.iy + y * context.jy
End

Function Scale( x#,y# )
	context.ix *= x
	context.iy *= x
	context.jx *= y
	context.jy *= y
End

Function Rotate( angle# )
	Local c = Cos(angle)
	Local s = Sin(angle)

	Local ix = c * context.ix + -s * context.jx
	Local iy = c * context.iy + -s * context.jy
	Local jx = s * context.ix +  c * context.jx
	Local jy = s * context.iy +  c * context.jy

	context.ix = ix
	context.iy = iy
	context.jx = jx
	context.jy = jy
End



Samah(Posted 2014) [#4]
The number of flops usually isn't the bottleneck, it's piping the vertices to the GPU. Mark has been very careful to keep the number of flushes to a minimum.
Do you have some stats for us?


nori(Posted 2014) [#5]
uh, tested/profiled them. the monkey functions don't work for some reason (even with context.tformed = True), and the "faster" if(tfomed)... block (2nd post) makes no difference at all

the only thing i gained from this experiment, is that i can let my swift project without any "if transformed" check at all and forget about it


Samah(Posted 2014) [#6]
Oh well, it's the thought that counts. :)