assembly coding!?
BlitzMax Forums/BlitzMax Programming/assembly coding!?
| ||
yes yes..i know this isnt strictly speaking a bmax question, but im using assembly language with bmax..so i guess its kinda ok to post here ;) first up..im 8086 newbie..so the following code is probably a joke..but id like folks to look at it + make suggestions and perhaps show me alternatives... both of these functions set the elements of 16 element float array to match that of a unit matrix..the first uses basic movement & immediate values - the 2nd uses the fpu instructions..which one is faster?, are there better ways to do this? _mat_unit4x4: push ebp mov ebp,esp mov eax,[esp+4+4] mov dword [eax],1.0 mov dword [eax+4],0.0 mov dword [eax+8],0.0 mov dword [eax+12],0.0 mov dword [eax+16],0.0 mov dword [eax+20],1.0 mov dword [eax+24],0.0 mov dword [eax+28],0.0 mov dword [eax+32],0.0 mov dword [eax+36],0.0 mov dword [eax+40],1.0 mov dword [eax+44],0.0 mov dword [eax+48],0.0 mov dword [eax+52],0.0 mov dword [eax+56],0.0 mov dword [eax+60],1.0 mov esp,ebp pop ebp ret _mat2_unit4x4: ;;;FPU version push ebp mov ebp,esp mov eax,[esp+4+4] fldz fst qword [eax+4] fst qword [eax+12] fst qword [eax+24] fst qword [eax+32] fst qword [eax+44] fstp qword [eax+52] fld1 fst dword [eax] fst dword [eax+20] fst dword [eax+40] fstp dword [eax+60] mov esp,ebp pop ebp ret thanks ;) |
| ||
I tested with this codesuperstrict import "Matrix.s" extern function mat_unit4x4(m:Float Ptr) function mat2_unit4x4(m:Float Ptr) end extern local marray : float[16] local starttime:Int=millisecs() for local i:int=0 to 20000000 mat2_unit4x4(varptr(marray[0])) next Print "mat2_unit4x4 "+(millisecs()-starttime) starttime=millisecs() for local i:int=0 to 20000000 mat_unit4x4(varptr(marray[0])) next Print "mat_unit4x4 "+ (millisecs()-starttime) end Seems that FPU version is slightly faster, not significant but faster on my P4 3GHZ. I thought maybe using REP STOD setting all to zero then expicitly setting the 1.0 elements would be faster but not so. Maybe ASM wizard might be able to help you optimize more. Maybe there is some more exotic opcodes for X86 that can help out. Possible MMX? Doug Stastny |
| ||
yeh same here..marginal speedup with fpu version.. not sure about mmx instructions, havent looked at the anything other than the standard fpu instructions yet.. asm wizard? thanks for trying it out.. pointer - in your code you dont need to use all that var ptr stuff...simply call mat2_unit4x4(arrayname)..maybe you were jst trying to make you code clearer..dunno ;) or perhaps you need that with strict code.. EDIT: eh?! - thats odd, the fpu version takes longer if i use superstrict in my code..lol, must be doing something odd.. |
| ||
Assembly wizard :) I did see a libary written in C with inlined ASM that uses MMX instructions. I didnt understand it at all :) Here is optimized lib. http://www.cs.technion.ac.il/~zdevir/main1.html If you have membership to Gamasutra you can read the article. Doug Stastny |
| ||
thanks :] - interesting doc + code |