Sqr is faster than /

Blitz3D Forums/Blitz3D Programming/Sqr is faster than /

Subirenihil(Posted 2007) [#1]
Did you know that
a#=Sqr#(x#)
is faster than
a#=x#/y#

And that
Atan(x#)
is faster than
Atan(x%)


True story!

Test Results:
Calculations per second
	Integers			Floats
Sin	 14903100			 11074200
Cos	 14992500			 11086500
Tan	 11350700			  9407340
Asin	 unable to test			  5002500
Acos	 unable to test			  5005000
Atan	  9532890			 11904800
Atan2	 11467900			  9363300
Sqr	 85470100			 54945100
 +	400000000			 64516100
 -	400000000			 64516100
 *	454545000			 64516100
 /	 82644600			 53475900
Mod	 65789500			 10905100

Integer speed sorted fastest to slowest
 *	454545000
 +	400000000
 -	400000000
Sqr	 85470100
 /	 82644600
Mod	 65789500
Cos	 14992500
Sin	 14903100
Atan2	 11467900
Tan	 11350700
Atan	  9532890
Asin	 unable to test
Acos	 unable to test

Float speed sorted fastest to slowest
 +	 64516100
 -	 64516100
 *	 64516100
Sqr	 54945100
 /	 53475900
Atan	 11904800
Cos	 11086500
Sin	 11074200
Mod	 10905100
Tan	  9407340
Atan2	  9363300
Acos	  5005000
Asin	  5002500



Floyd(Posted 2007) [#2]
All the math functions like ATan() take floating point arguments.

So ATan( 5 ) is a two step procedure: convert integer 5 to float 5.0 and then compute ATan( 5.0 ).


TomToad(Posted 2007) [#3]
And just how did you test these functions? I just ran my own test and found that I have an average of 1254 ms per 10000000 iterations for Sqr(x#) and 1182 ms per 10000000 iterations for x#/y#, showing that x#/y# is slightly faster than Sqr(x#).
Here's my test program:
x# = 1.0 ;assign all variables first so that there is no allocation overhead
y# = 1.0
z# = 1.0
i = 0
time = 0
l =0
t = 0
Delay(1000) ;let the program settle and finish it's overhead
For l = 1 To 10 ;repeat the test 10 times
	Time = MilliSecs()
	For i = 0 To 10000000
		z = Sqr(x) ;test function here
	Next
	Time = MilliSecs() - Time
	t = t + time
	Print time
Next
Print "Average = "+t/10



big10p(Posted 2007) [#4]
Why are you comparing sqr with / ?


Tom(Posted 2007) [#5]
I'm with big10p on this :)


Beaker(Posted 2007) [#6]
I'm with Tom on this. Life is too short.


Nexus6(Posted 2007) [#7]
How can you compare the 2 anyway, its like saying my car is faster than your house. If you would of said
a=100
b=10
Repeat
	a=a-b
	d=d+1
Until a<b
Print "100/10="+d
WaitKey 

Is faster than

Print "100/10="+100/10
WaitKey

Then that would of been an acceptible comparison, actually I wonder if the above is faster.


Subirenihil(Posted 2007) [#8]
Repeat
	st=MilliSecs()
	For x#=.0001 To 1 Step .0001
		For y#=1 To 1000
			ang#=Sqr#(x#)
		Next
	Next
	sp=MilliSecs()
	Print Str$(10000/Float#(sp-st))+" million operations per second."
Until KeyHit(1)

FlushKeys
WaitKey
End
Repeat
	st=MilliSecs()
	For x#=.0001 To 1 Step .0001
		For y#=1 To 1000
			ang#=x#/y#			;<---This line is the only thing that changes
		Next
	Next
	sp=MilliSecs()
	Print Str$(10000/Float#(sp-st))+" million operations per second."
Until KeyHit(1)

FlushKeys
WaitKey
End

The top code is for Sqr, the bottom for /

The test results were the fastest times for each function after a minimum of 25 runs. As many of you have noticed, the first few test results are not necessarily accurate due to the system being bogged down - so to speak - with loading the program. I therefore did not do averages in the listed speeds above, rather, I used the fastest times achieved.

Stand by for more test results....
And here they are...

For this test, I ignored the first 20 tests, then took the highest, average and lowest scores for the following 100 runs.

Newest test results, calculations per second:
Function:	Highest:	Average:	Lowest:
Sqr		55248600	54338700	45662100
 /		53475900	52806500	46729000
 +		64516100	63357500	54347800
 -		64935100	63692500	55865900
 *		64516100	63712500	54945100



Subirenihil(Posted 2007) [#9]
TomToad said:
And just how did you test these functions? I just ran my own test and found that I have an average of 1254 ms per 10000000 iterations for Sqr(x#) and 1182 ms per 10000000 iterations for x#/y#, showing that x#/y# is slightly faster than Sqr(x#).

I ran my own tests with your program, both functions averaged 103 ms. So I increased the iterations ten fold. Sqr took 1037 ms compared with 1039 for /. Even with your program, the Sqr function out performs division.


Defoc8(Posted 2007) [#10]
I dont know the specifics/timings - but both divide and
sqrt are supported directly by the fpu - one reason you might be seeing faster results for sqrt in
some cases is that it only requires one operand - and
therfore only one value needs to loaded from memory
to the fpu.... there is also the issue of type conversion,
integers can be loaded into fpu registers..this may be confusing the situation.. dont know, im not an 8086 expert..

feel free to bite my head off :p ;)