Less multiplications per quad possible?
Monkey Forums/Monkey Programming/Less multiplications per quad possible?
| ||
I've (for fun) converted Obj-C code from Monkey to Swift and while doing that I was (not because of Swift) able to reduce the number of float multiplications (for the tformed==true case) from 18 to 12. That code snippet sadly is in Swift, isn't very readable, omits the if tformed (it always does the tformed block, maybe that makes it more compact), puts a few premultiplied more into the Surface class, and might even break the transform if I'm not good at testing. But, I believe, actually I did not leave anything out. Anyway, it looks like this, and I thought maybe the mojo routines have still optimization potential: was useless swift code PS: I realize I might have thrown away uscale and vscale code in some way, dunno. |
| ||
ok, turns out, all there is possible is changing the tformed block to do 2 more stores and 4 multiplications less :I. and it even won't really matter with current hardware maybe. everything else was due to me using npot textures and whateverif( tformed ){ float xw = x1; float yh = y3; float xix = x * ix; float xwix = xw * ix; x0 = xix + y * jx + tx; x1 = xwix + y * jx + tx; x2 = xwix + yh * jx + tx; x3 = xix + yh * jx + tx; float xiy = x * iy; float xwiy = xw * iy; y0 = xiy + y * jy + ty; y1 = xwiy + y * jy + ty; y2 = xwiy + yh * jy + ty; y3 = xiy + yh * jy + ty; } |
| ||
some faster functions on the monkey side, since trans apparently doesn't optimize the * 0 and * 1 and the compilers probably won't either (untested) Function Translate( x#,y# ) context.tx += x * context.ix + y * context.jx context.ty += x * context.iy + y * context.jy End Function Scale( x#,y# ) context.ix *= x context.iy *= x context.jx *= y context.jy *= y End Function Rotate( angle# ) Local c = Cos(angle) Local s = Sin(angle) Local ix = c * context.ix + -s * context.jx Local iy = c * context.iy + -s * context.jy Local jx = s * context.ix + c * context.jx Local jy = s * context.iy + c * context.jy context.ix = ix context.iy = iy context.jx = jx context.jy = jy End |
| ||
The number of flops usually isn't the bottleneck, it's piping the vertices to the GPU. Mark has been very careful to keep the number of flushes to a minimum. Do you have some stats for us? |
| ||
uh, tested/profiled them. the monkey functions don't work for some reason (even with context.tformed = True), and the "faster" if(tfomed)... block (2nd post) makes no difference at all the only thing i gained from this experiment, is that i can let my swift project without any "if transformed" check at all and forget about it |
| ||
Oh well, it's the thought that counts. :) |