assembly coding!?

BlitzMax Forums/BlitzMax Programming/assembly coding!?

Defoc8(Posted 2006) [#1]
yes yes..i know this isnt strictly speaking a bmax question,
but im using assembly language with bmax..so i guess
its kinda ok to post here ;)

first up..im 8086 newbie..so the following code is probably
a joke..but id like folks to look at it + make suggestions
and perhaps show me alternatives...

both of these functions set the elements of 16 element
float array to match that of a unit matrix..the first uses
basic movement & immediate values - the 2nd uses
the fpu instructions..which one is faster?, are there
better ways to do this?

_mat_unit4x4:
push ebp
mov ebp,esp
mov eax,[esp+4+4]
mov dword [eax],1.0
mov dword [eax+4],0.0
mov dword [eax+8],0.0
mov dword [eax+12],0.0
mov dword [eax+16],0.0
mov dword [eax+20],1.0
mov dword [eax+24],0.0
mov dword [eax+28],0.0
mov dword [eax+32],0.0
mov dword [eax+36],0.0
mov dword [eax+40],1.0
mov dword [eax+44],0.0
mov dword [eax+48],0.0
mov dword [eax+52],0.0
mov dword [eax+56],0.0
mov dword [eax+60],1.0
mov esp,ebp
pop ebp
ret


_mat2_unit4x4: ;;;FPU version
push ebp
mov ebp,esp
mov eax,[esp+4+4]
fldz
fst qword [eax+4]
fst qword [eax+12]
fst qword [eax+24]
fst qword [eax+32]
fst qword [eax+44]
fstp qword [eax+52]
fld1
fst dword [eax]
fst dword [eax+20]
fst dword [eax+40]
fstp dword [eax+60]
mov esp,ebp
pop ebp
ret


thanks ;)


DStastny(Posted 2006) [#2]
I tested with this code

superstrict
import "Matrix.s"
extern 
	function mat_unit4x4(m:Float Ptr)
	function mat2_unit4x4(m:Float Ptr)
end extern

local marray : float[16]

local starttime:Int=millisecs()
for local i:int=0 to 20000000
	mat2_unit4x4(varptr(marray[0]))
next
Print "mat2_unit4x4 "+(millisecs()-starttime)

starttime=millisecs()
for local i:int=0 to 20000000
	mat_unit4x4(varptr(marray[0]))
next
Print "mat_unit4x4 "+ (millisecs()-starttime)
end


Seems that FPU version is slightly faster, not significant but faster on my P4 3GHZ.

I thought maybe using REP STOD setting all to zero then expicitly setting the 1.0 elements would be faster but not so. Maybe ASM wizard might be able to help you optimize more. Maybe there is some more exotic opcodes for X86 that can help out. Possible MMX?

Doug Stastny


Defoc8(Posted 2006) [#3]
yeh same here..marginal speedup with fpu version..
not sure about mmx instructions, havent looked at the
anything other than the standard fpu instructions yet..

asm wizard?

thanks for trying it out..

pointer - in your code you dont need to use all that var ptr
stuff...simply call mat2_unit4x4(arrayname)..maybe you
were jst trying to make you code clearer..dunno ;) or
perhaps you need that with strict code..


EDIT: eh?! - thats odd, the fpu version takes longer if
i use superstrict in my code..lol, must be doing
something odd..


DStastny(Posted 2006) [#4]
Assembly wizard :)

I did see a libary written in C with inlined ASM that uses MMX instructions. I didnt understand it at all :)

Here is optimized lib.

http://www.cs.technion.ac.il/~zdevir/main1.html

If you have membership to Gamasutra you can read the article.


Doug Stastny


Defoc8(Posted 2006) [#5]
thanks :]
- interesting doc + code