Sqr is faster than /
Blitz3D Forums/Blitz3D Programming/Sqr is faster than /
| ||
Did you know that a#=Sqr#(x#)is faster than a#=x#/y# And that Atan(x#)is faster than Atan(x%) True story! Test Results: Calculations per second Integers Floats Sin 14903100 11074200 Cos 14992500 11086500 Tan 11350700 9407340 Asin unable to test 5002500 Acos unable to test 5005000 Atan 9532890 11904800 Atan2 11467900 9363300 Sqr 85470100 54945100 + 400000000 64516100 - 400000000 64516100 * 454545000 64516100 / 82644600 53475900 Mod 65789500 10905100 Integer speed sorted fastest to slowest * 454545000 + 400000000 - 400000000 Sqr 85470100 / 82644600 Mod 65789500 Cos 14992500 Sin 14903100 Atan2 11467900 Tan 11350700 Atan 9532890 Asin unable to test Acos unable to test Float speed sorted fastest to slowest + 64516100 - 64516100 * 64516100 Sqr 54945100 / 53475900 Atan 11904800 Cos 11086500 Sin 11074200 Mod 10905100 Tan 9407340 Atan2 9363300 Acos 5005000 Asin 5002500 |
| ||
All the math functions like ATan() take floating point arguments. So ATan( 5 ) is a two step procedure: convert integer 5 to float 5.0 and then compute ATan( 5.0 ). |
| ||
And just how did you test these functions? I just ran my own test and found that I have an average of 1254 ms per 10000000 iterations for Sqr(x#) and 1182 ms per 10000000 iterations for x#/y#, showing that x#/y# is slightly faster than Sqr(x#). Here's my test program: x# = 1.0 ;assign all variables first so that there is no allocation overhead y# = 1.0 z# = 1.0 i = 0 time = 0 l =0 t = 0 Delay(1000) ;let the program settle and finish it's overhead For l = 1 To 10 ;repeat the test 10 times Time = MilliSecs() For i = 0 To 10000000 z = Sqr(x) ;test function here Next Time = MilliSecs() - Time t = t + time Print time Next Print "Average = "+t/10 |
| ||
Why are you comparing sqr with / ? |
| ||
I'm with big10p on this :) |
| ||
I'm with Tom on this. Life is too short. |
| ||
How can you compare the 2 anyway, its like saying my car is faster than your house. If you would of saida=100 b=10 Repeat a=a-b d=d+1 Until a<b Print "100/10="+d WaitKey Is faster than Print "100/10="+100/10 WaitKey Then that would of been an acceptible comparison, actually I wonder if the above is faster. |
| ||
Repeat st=MilliSecs() For x#=.0001 To 1 Step .0001 For y#=1 To 1000 ang#=Sqr#(x#) Next Next sp=MilliSecs() Print Str$(10000/Float#(sp-st))+" million operations per second." Until KeyHit(1) FlushKeys WaitKey End Repeat st=MilliSecs() For x#=.0001 To 1 Step .0001 For y#=1 To 1000 ang#=x#/y# ;<---This line is the only thing that changes Next Next sp=MilliSecs() Print Str$(10000/Float#(sp-st))+" million operations per second." Until KeyHit(1) FlushKeys WaitKey End The top code is for Sqr, the bottom for / The test results were the fastest times for each function after a minimum of 25 runs. As many of you have noticed, the first few test results are not necessarily accurate due to the system being bogged down - so to speak - with loading the program. I therefore did not do averages in the listed speeds above, rather, I used the fastest times achieved. Stand by for more test results.... And here they are... For this test, I ignored the first 20 tests, then took the highest, average and lowest scores for the following 100 runs. Newest test results, calculations per second: Function: Highest: Average: Lowest: Sqr 55248600 54338700 45662100 / 53475900 52806500 46729000 + 64516100 63357500 54347800 - 64935100 63692500 55865900 * 64516100 63712500 54945100 |
| ||
TomToad said: And just how did you test these functions? I just ran my own test and found that I have an average of 1254 ms per 10000000 iterations for Sqr(x#) and 1182 ms per 10000000 iterations for x#/y#, showing that x#/y# is slightly faster than Sqr(x#). I ran my own tests with your program, both functions averaged 103 ms. So I increased the iterations ten fold. Sqr took 1037 ms compared with 1039 for /. Even with your program, the Sqr function out performs division. |
| ||
I dont know the specifics/timings - but both divide and sqrt are supported directly by the fpu - one reason you might be seeing faster results for sqrt in some cases is that it only requires one operand - and therfore only one value needs to loaded from memory to the fpu.... there is also the issue of type conversion, integers can be loaded into fpu registers..this may be confusing the situation.. dont know, im not an 8086 expert.. feel free to bite my head off :p ;) |