Multiplication Vrs Power

BlitzMax Forums/BlitzMax Beginners Area/Multiplication Vrs Power

H&K	(Posted 2006) [#1]

Hi Peeps.

There was a quite interesting thread about which was better, Sin/cos directly or pre calculated arrays

I would like to ask a similar question. I dont use ^ very much, but whenever I do, there are atleat three ways.

1) I just use ^
2) I use multiplication and divition to fake ^
3) I pre calculate. (Ie, I could have field of the common powers I want Number.Power3)

I use 1) except when its ^2, then I just Multiply 2).

Does anyone have preference, and or advice?

tonyg

(Posted 2006) [#2]

I never noticed a difference.
Maybe using it 100000's of times per frame but I get the feeling they get compiled to the same code.
(in fact, it looks like they do).

GfK	(Posted 2006) [#3]

I use 1) except when its ^2, then I just Multiply 2).

Are you having unexplained problems? Because that's incorrect.

10 ^ 2 = 100
10 * 2 = 20

H&K	(Posted 2006) [#4]

10^2 or 10*10
the 2) was to say point 2

FlameDuck

(Posted 2006) [#5]

Multiplication is faster for lower orders.

Dreamora

(Posted 2006) [#6]

Lower order means: ^2
Above that GCC is that highly optimized that multiplication will be slower. And that although the used compiler is a quite outdated one.

Defoc8

(Posted 2006) [#7]

if your multiplying an int by 2..
10*2, then you can use shl 1,
and for int division by 2, use shr 1..
(if you dont know how the shift operator works)
(check the docs or ask on the forums..)

for powers..i dont think your really going to save yourself
any time by not using the "^" operator.

var=sqr(x^2+y^2), might better be presented as
var=sqr(x*x+y*y), but the sqr function is costly anyway,
so whos to say its going to make much difference here?

.. i personally dont think its worth worrying about..

Dreamora = "Above that GCC is that highly optimized that multiplication will be slower"
so your saying, that 2*2*2 under gcc is slower than
say, 2^3 under gcc? - heh..i dont believe you ;)

TomToad

(Posted 2006) [#8]

Tried this out

Strict
Local t:Int = 0
Local a:Int = 0
Local b:Int = 4
Local startTime:Int = 0
Local EndTime:Int = 0
Delay 3000
StartTime = MilliSecs()
For t = 0 To 100000
	a = b^3
Next
EndTime = MilliSecs()
Print a + " " + (EndTime - StartTime)

Delay 3000
StartTime = MilliSecs()
For t = 0 To 100000
	a = b*b*b
Next
EndTime = MilliSecs()
Print a + " " + (EndTime - StartTime)

Results were
63 28
64 4
Notice not only was b*b*b 7 times faster, but also gave the correct answer. That's because the ^ opperator works with doubles. So it must first convert the Ints to Doubles which not only takes time, but also creates rounding errors.

Changing a and b to doubles result in
63.999999999999979 24
64.000000000000000 5
Once again, the result of ^ gives rounding errors, and is still slower than b*b*b
Comparing b^10 vs b*b*b*b*b*b*b*b*b*b gives
1048576.0000000000 25
1048576.0000000000 5
Next, adding Local c:Double = 0 to the top (a and b are still declared doubles) and changing the formulas to
a = Sqr(b^2 + c^2) and a = Sqr(b*b+c*c) gives this result:
7.2111025509279782 47
7.2111025509279782 4

So what does this show? The ^ opperator is significantly slower than using * and also intruduces rounding errors even when using Ints.
You might be thinking that 40 ms per 100000 loops isn't large enough to worry about, but if you were checking the distance of each of 1000 objects from each other, that'd be 500500 checks assuming that if you check distance a to b, then you don't check distance b to a. A test shows that 500500 checks with ^ takes 252 ms, way too much time. Using * takes 29 ms. Enough for 30fps, if you break up the checks and do half each frame, you can get 60 fps.

Dreamora

(Posted 2006) [#9]

Thats no really usefull test. You mix int and double and don't even test float which is the normal type for this kind of operations.

Using the same but a and b float, then the result is:

64.0000000 25
64.0000000 0

Accuraccy: No problem
Speed: A significant difference :-)

Btw: If you wanted to test it truely on Float, you would have to use b^(3:float) to declare 3 as float. Otherwise it goes to double as well. But this does not make a real speed difference in this case.

Note: I'm using MingW 5, not the stoneage one proposed by BRL.

H&K	(Posted 2006) [#10]

@ Dream.

Please can you explain

Note: I'm using MingW 5, not the stoneage one proposed by BRL.

Thanks in advance

skidracer

(Posted 2006) [#11]

Yes, seeing as MingW is not even a requirement for BlitzMax programming I'm interested also.

Floyd

(Posted 2006) [#12]

BlitzMax has its own hacked together version of exponentiation because the MingW pow() function was so horrendously slow.

I always hoped this was a temporary measure until the MingW guys fixed pow(). Does anyone know if they did?

TomToad

(Posted 2006) [#13]

@Dreamora: Actually I did try float. It was marginally slower than Doubles. When using the ^ opperator, the arguments are passed to the function double bbFloatPow( double x,double y ) in blitz_cclib.c. There isn't a Float nor an Int version. Anything you use will be converted to double, including Floats.

Now I tried doing b^(3:Float) and b^(3:double) and didn't see any change.

The fact is, ^ is still significantly slower than *.

Here's a real world example.

Strict 
Type TSprite
	Field x
	Field y
End Type

Local Sprites:TSprite[200]

For Local t:Int = 0 To 199
	Sprites[t] = New TSprite
	Sprites[t].x = Rand(0,1023)
	Sprites[t].y = Rand(0,767)
Next

Graphics 1024,768,32
Local Frames:Int = 0
Local FPS:Int = 0
Local Power:Int = True
Local Distance:Float
Local Time:Int = MilliSecs()

While Not KeyHit(KEY_ESCAPE)
	Cls
	If KeyHit(KEY_ENTER) Then Power = Not Power
	For Local t:Int = 0 To 198
		SetColor 255,255,255
		For Local u:Int = t+1 To 199
			If Power
				Distance = Sqr((Sprites[t].x-Sprites[u].x)^2+(Sprites[t].y-Sprites[u].y)^2)
			Else
				Distance = Sqr((Sprites[t].x-Sprites[u].x)*(Sprites[t].x-Sprites[u].x)+..
								(Sprites[t].y-Sprites[u].y)*(Sprites[t].y-Sprites[u].y))
			End If
			If Distance < 50 Then SetColor 255,255,0
		Next
		DrawOval Sprites[t].x-5,Sprites[t].y-5,10,10
	Next
	SetColor 255,255,255
	DrawRect 9,9,100,30
	SetColor 0,0,255
	If Power
		DrawText "Sqr(x^2+y^2)",10,10
	Else
		DrawText "Sqr(x*x+y*y)",10,10
	End If
	
	DrawText FPS,10,20
	Flip 0
	If MilliSecs() - Time >= 1000
		FPS = Frames
		Frames = 0
		Time = MilliSecs()
	End If
	Frames :+ 1
Wend

Press ENTER to toggle from using ^ to using * Notice a huge jump? On my machine it went from 53 fps to 200 fps. Makes a huge difference, and this was just checking the distance between 200 objects.

Dreamora

(Posted 2006) [#14]

The difference with MingW5 is: When I rebuild the modules the take advantage of the current version of the GCC compiler on windows (as it does on Linux and Apple as well so I think its only fair to have the current version on windows).

This one has got some work in the last 2+ years since the 3.1 came out (you see, V3 to V5 now).
Aω you see with the comparision: Using the multiplications takes 0 ms on my CoreDuo 2Ghz, while the ^ uses the strange BM version which stays as slow as with the original MingW build ...

So if mark came around using the current GCC on Windows as well (not only on Linux and Apple), the ^ workaround wouldn't be needed (at least thats what I think as there is a reason we are 2 major versions further)

Beside that, my BM (all modules rebuild) is capable of using more than 1 core. Don't ask me why but thats what happens when I let it work with high load.

skidracer

(Posted 2006) [#15]

We still are forcing gcc3.3 on Linux due to 4.0's refusal to allow statically linking to c++ standard lib meaning a big drop in compatability for blitzmax produced binaries.

Defoc8

(Posted 2006) [#16]

"Multiplication is faster for lower orders. "
- coudnt the compiler be made to expand low order values?
- 2^3 to c/c++ = 2*2*2
(oops!, didnt think about problems with unknown)
(quantities..ignore me ;)) x^y

i dont know at what point using loops or special intructions
in assembly would benefit such a method? - i mean at
what point would expansion, register1*register2,
register1*register2...
start causing problems?.. i dont know much about modern
processors, and no doubt this is complicated by ppc chips
perhaps not even offering a "mult" opcode?

anyway point taken.. for intances were "power" would
be used to process large quantities of data, and were
mults can easily be used in place of this operator..choose
mult.. - i honestly didnt think it would make a big
difference..heh..wrong as usual ;)

Dreamora

(Posted 2006) [#17]

Hmm
With "current" on Linux I meant more like 3.4.6 and the like that is normally on the Distri Disks.
I know of GCC 4 but as heard well of some problems with it.

On windows, MingW 5 (-> GCC 3.4.2) is working great unless you own MaxGUI. (I got around the build errors but get the calling errors when trying to execute an app so I started to resync the latest win32gui.mod after I rebuild everything which at least seems to work)
I know that this isn't supported officially.
But as the most recent still is 3.4.X based, I didn't and still don't see a problem in using it.
But I might be wrong or have missed something ...

If so, I'm happy to know what the problem is (beside the win32gui intersection between max and mingw 5 GCC install) *I'm always happy to get my errors shown to make it better in future*