Multiplying and Dividing FASTER than Shr and Shl?

Blitz3D Forums/Blitz3D Programming/Multiplying and Dividing FASTER than Shr and Shl?

Serpent

(Posted 2010) [#1]

Alright, the differences I'm getting aren't that much, but it seems that my Shr and Shl commands area actually SLOWER than dividing and multiplying by powers of 2!

It's not that big a deal to be honest, but I though that Shr and Shl, being simple bit shifting operations, would be considerably faster.

I'm wondering if it's the same for everyone else:

t = MilliSecs()
For times = 1 To 300000000
x = 1
x = x Shl 16
Next
t = MilliSecs() - t
Print "Shifting:  " + t
t = MilliSecs()
For times = 1 To 300000000
x = 1
x = x * 65536
Next
t = MilliSecs() - t
Print "Multiplying:  " + t
WaitKey
End

Alright, there's only a few ms in it (on my computer), even when the operations are repeated 300 million times, but it's interesting that everyone's convictions about bit shifting being significantly faster than multiplying and dividing are wrong!

Yasha

(Posted 2010) [#2]

I would have to suggest that these results are statistically insignificant - try both loops with the actual operation commented out, and one will still be faster than the other. Try both loops with both the assignment and the operation commented out and on my machine one was a full 150ms faster than the other.

It's one of the unfortunate shortcomings of Blitz3D that it's really not very transparent about what the compiler's doing - the arrangement of variables can have a bigger effect than what you actually do with them - as well as that some structures like For loops aren't optimised very heavily - in this test, the loops alone were taking up 95% of the time on my computer, commenting out the stuff inside them.

If you really need this kind of optimisation, write the routine in C and import it! (In the right circumstances, GCC with the max optimisation flag set can produce code upwards of twenty times faster than B3D)

Floyd

(Posted 2010) [#3]

Blitz3D already replaces multiplication and division by constant powers of two with the appropriate shift. So the test isn't actually testing multiplication.

However, it does show that for multiplication this is a pointless optimization on modern CPU's. It's still worth doing for division.

Serpent

(Posted 2010) [#4]

Floyd: I thought that Blitz might optimise the operations to shifts. I didn't post code for division tests but it is exactly the same - no difference, if anything Shr takes longer. It doesn't really matter anyway - a few ms difference after 300 million loops is pretty insignificant.

Yasha: Your 150ms gap with no ops demonstrates the uselessness of my example pretty well. I tried commenting the actual variable code out as well. My machine had an insignificant 1 ms gap, and the loops themselves took 1/4 of the time than with the assignment and calculations, but your points are accurate anyway.

This topic is pretty useless in itself - I was surprised that changing everything from calculations to shifts had no impact on speed and was a bit quick to post.

Floyd

(Posted 2010) [#5]

I didn't post code for division tests but it is exactly the same

For constants it is indeed exactly the same. The division isn't being done.
In this test I get actual division taking more than six times as long.

Const c16 = 65536
Local v16 = 65536

t = MilliSecs()
For times = 1 To 300000000
	x = 1
	x = x Shr 16
Next
t = MilliSecs() - t
Print "Shifting:  " + t

t = MilliSecs()
For times = 1 To 300000000
	x = 1
	x = x / c16
Next
t = MilliSecs() - t
Print "Constant div:  " + t

t = MilliSecs()
For times = 1 To 300000000
	x = 1
	x = x / v16
Next
t = MilliSecs() - t
Print "Variable div:  " + t

WaitKey
End

Ross C

(Posted 2010) [#6]

Also your test should start with a 3000 odd millisecs delay in case of any start delays

_PJ_	(Posted 2010) [#7]

I would believe that the apparent optimisation of Shl and Shr isn't so important with modern CPUs, unless perhaps a program was require to do an incredble number of calculations.

I would be wary of any timed results obtained this way, since there are so many other factors to consider, such as other running processes and priorities thereof.

Serpent

(Posted 2010) [#8]

Floyd: This makes sense.

Ross: You're probably right - this would acount for the few ms difference I found.

Malice: You're also right - a few ms difference (which could be due to external factors) over 300 million calculations doesn't really mean much at all - I was just a little quick to post this.