Best way to calc next ^2 value...

Blitz3D Forums/Blitz3D Programming/Best way to calc next ^2 value...

big10p

(Posted 2005) [#1]

Blimey, I should really know how to do this - I seem to be suffering from brainfade today. :/

Anyway, given a random integer value, how would you calculate the next highest ^2 value, but only if the value isn't already a ^2 number?

I did come up with the following - which I cobbled together from stuff in the docs. It seems to work, but I'm not too sure how (maths not being my favourite subject) :P

Function pow2(x)
  Return 1 Shl (Ceil(Log(x) / Log(2)))
End Function

So, is there a simpler way?

octothorpe

(Posted 2005) [#2]

That formula is correct. I haven't been able to think of anything simpler. I prefer verbose function names and called mine round_up_to_a_power_of_two(). :)

"log(x)/log(2)" tells you to what power you'd need to raise 2 to get x

"ceil()" rounds it up to a whole number

"1 shl y" is a sneaky bitwise way to raise 2 to the power of y

big10p

(Posted 2005) [#3]

lol, thanks octothorpe. Can't believe I got the right formula by stabbing in the dark. Well, there was logic to how I came up with it - I just can't remember it. :)

And yes, that function name is crap - I actually use verbose names myself, usually. I just quickly typed that function in from memory because the actual code is on another machine. I'll probably call the function clamp_to_pow2(), or something - I'm not quite as verbose as you. :P

Thanks again!

Floyd

(Posted 2005) [#4]

If you like puzzles try to figure out why this works.

Graphics 300, 460, 0, 2

For n = 1 To 35
	n2 = NextPowerOfTwo( n )
	Print RSet( n, 5 ) + "  " + n2
Next

WaitKey

Function NextPowerOfTwo( n )
	n = n - 1
	n = n Or ( n Shr 1 )
	n = n Or ( n Shr 2 )
	n = n Or ( n Shr 4 )
	n = n Or ( n Shr 8 )
	n = n Or ( n Shr 16 )
	Return n + 1
End Function

octothorpe

(Posted 2005) [#5]

Simpler? No. Cuter? Yes! And probably much faster too. :)

I won't spoil it for everyone else. I wish this forum supported [spoiler] tags!

big10p

(Posted 2005) [#6]

Floyd, did you come up with that yourself? My head hurts. :P

PGF	(Posted 2005) [#7]

Floyd - Your version doesn't work as advertised here. I reckon you need '|' (bitwise OR) instead of 'Or' (conditional OR). Neat trick though.

I would have done it as in NextPower2(). We'd both have problems with the maximum integer value. And signed numbers are a bit iffy as well.

And NextPowerV3() is another variation. Pretty ugly but it just might be faster (haven't benchmarked it). There are many ways to arrange the search tree. The optimal depends upon the expected range of values.

If you have a small possible range of values a lookup table would be another way and probably the fastest so long as the number of calls is large enough to make up for the table initialisation.

Note that I've changed the numbers tested from 1-35 to a randon selection of 50 in the range 1-$3FFFFFFF.

Strict

Print "Starting"

Local iTest:Int

For iTest:Int = 1 To 50

	Local n:Int = Rand(1, $3FFFFFFF)
	
	Local v1:Int = NextPowerV1(n)
	Local v2:Int = NextPowerV2(n)
	Local v3:Int = NextPowerV3(n)
	
	Local msg:String
	
	If (v1 = v2) And (v1 = v3)
		msg = ""
	Else
		msg = "Error"
	EndIf
	
	Print RSet(iTest, 4) + RSet(n, 12) + RSet (v1, 12) + RSet(v2, 12) + RSet(v3, 12) + " " + msg
Next

Print "Finished"

Function NextPowerV1:Int(n:Int)
	n = n - 1
	n = n | ( n Shr 1 )
	n = n | ( n Shr 2 )
	n = n | ( n Shr 4 )
	n = n | ( n Shr 8 )
	n = n | ( n Shr 16 )
	Return n + 1
End Function

Function NextPowerV2:Int(n:Int)
	If n > $40000000
		Throw "Value too big"
	EndIf
	
	Local v:Int = 1
	While (v < n)
		v = v Shl 1
	Wend
	Return v
End Function

Function NextPowerV3:Int(n:Int)
	If n < 1
		Throw "Value too low"
	EndIf
	
	If n <= $10
		If n <= $8
			If n <= $4
				If n <= $2
					If n <= $1
						Return $1
					Else
						Return $2
					EndIf
				Else
					Return $4
				EndIf
			Else
				Return $8
			EndIf
		Else
			Return $10
		EndIf
	Else If n <= $100
		If n <= $80
			If n <= $40
				If n <= $20
					Return $20
				Else
					Return $40
				EndIf
			Else
				Return $80
			EndIf
		Else
			Return $100
		EndIf
	Else If n <= $1000
		If n <= $800
			If n <= $400
				If n <= $200
					Return $200
				Else
					Return $400
				EndIf
			Else
				Return $800
			EndIf
		Else
			Return $1000
		EndIf
	Else If n <= $10000
		If n <= $8000
			If n <= $4000
				If n <= $2000
					Return $2000
				Else
					Return $4000
				EndIf
			Else
				Return $8000
			EndIf
		Else
			Return $10000
		EndIf
	Else If n <= $100000
		If n <= $80000
			If n <= $40000
				If n <= $20000
					Return $20000
				Else
					Return $40000
				EndIf
			Else
				Return $80000
			EndIf
		Else
			Return $100000
		EndIf
	Else If n <= $1000000
		If n <= $800000
			If n <= $400000
				If n <= $200000
					Return $200000
				Else
					Return $400000
				EndIf
			Else
				Return $800000
			EndIf
		Else
			Return $1000000
		EndIf
	Else If n <= $10000000
		If n <= $8000000
			If n <= $4000000
				If n <= $2000000
					Return $2000000
				Else
					Return $4000000
				EndIf
			Else
				Return $8000000
			EndIf
		Else
			Return $10000000
		EndIf
	Else
		If n <= $40000000
			If n <= $20000000
				Return $20000000
			Else
				Return $40000000
			EndIf
		Else
			Throw "Value too big"
		EndIf
	EndIf
	
End Function

octothorpe

(Posted 2005) [#8]

Your version doesn't work as advertised here. I reckon you need '|' (bitwise OR) instead of 'Or' (conditional OR).

Last I checked, "here" is a Blitz3D forum: And, Or, and Xor are bitwise, and Pipe is not an operator. Seeing that there's a Strict pragma in (what I assume is) BlitzMax is making me want to switch though!

I really like NextPowerV2(). It's both simple and cute. :)

If you're concerned about speed, I would suspect a binary search in NextPowerV3() would prove faster - even if you want to weight it for a particular range, it'd still be faster to cut your potential answers in half each guess instead of guessing arithmetically.

Personally, I would find that function much more readable if the hex notation of literals (e.g. $2000000) was replaced with "1 shl 27". I would assume either compiler would optimize away simple, constant mathematical expressions before runtime.

PGF	(Posted 2005) [#9]

Yes - I confused dialects so retract my comment about the bitwise OR. Sorry about that. Worth noting for anybody switching between the versions as the Blitz3D code ran in BlitzMax but didn't produce correct results.

NextPowerV3() is doing a binary search within each hexadecimal power. So it favours the smaller numbers slightly. As I said there are many ways of ordering the tree depending on what you expect to be the most frequent range of n.

I don't know whether "1 shl 27" would be replaced by a constant. Good if it was. In practice I would tend to generate code like this so human readability isn't that important for the output.

Really I was just providing a couple of alternative ideas inspired by Floyd's neat but obscure trick.

Hotcakes

(Posted 2005) [#10]

I'm under the impression that any calculation that can be made constant, is made constant. But it's been a while so I could be wrong.

octothorpe

(Posted 2005) [#11]

Looks like mathematical operators (including Shl) are optimized away, but builtin functions such as Log() are not.

Const iterations = 10000000
Return 7 ; 203 ms
Return 1 Shl 13 ; 184 ms
Return Log(30) ; 885 ms
Return 7.6 ; 178 ms
a = 7 : b = 8 : Return a * b ; 317 ms
Return iterations / 678 ; 179 ms

These benchmark results were obtained on a 1.6ghz p4 with Blitz3D 1.90, debug mode disabled.

Const iterations = 10000000

s = MilliSecs() : For i = 1 To iterations
	test_int()
Next : e = MilliSecs() - s
Print "test_int:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_shl()
Next : e = MilliSecs() - s
Print "test_shl:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_log()
Next : e = MilliSecs() - s
Print "test_log:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_float()
Next : e = MilliSecs() - s
Print "test_float: " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_var()
Next : e = MilliSecs() - s
Print "test_var:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_constmath()
Next : e = MilliSecs() - s
Print "test_constmath:   " + e + "ms"

WaitKey() : End

Function test_int#()
	Return 7
End Function

Function test_shl#()
	Return 1 Shl 13
End Function

Function test_log#()
	Return Log(30)
End Function

Function test_float#()
	Return 7.6
End Function

Function test_var#()
	a = 7 : b = 8
	Return a * b
End Function

Function test_constmath#()
	Return iterations / 678
End Function

PGF	(Posted 2005) [#12]

Added test_empty() and test_ceil() for more information.

Results (Athlon XP 1800+, ~1.533GHz):

test_int: 131ms
test_shl: 176ms
test_log: 1106ms
test_float: 150ms
test_var: 163ms
test_constmath: 160ms
test_empty: 123ms
test_ceil: 1478ms

I agree with you that the internal math functions are not replaced by const values.

However after subtracting the loop and call overhead shown by test_empty(), I believe that there is significant variation in the other functions compared to test_int.

test_int: 8ms
test_shl: 53ms
test_log: 983ms
test_float: 27ms
test_var: 40ms
test_constmath: 37ms
test_empty: 0ms (this is the baseline)
test_ceil: 1355ms

From this I'd say that expressions like '1 Shl 13' are not reduced to a constant.

I'll think about a more conclusive test.

Const iterations = 10000000

s = MilliSecs() : For i = 1 To iterations
	test_int()
Next : e = MilliSecs() - s
Print "test_int:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_shl()
Next : e = MilliSecs() - s
Print "test_shl:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_log()
Next : e = MilliSecs() - s
Print "test_log:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_float()
Next : e = MilliSecs() - s
Print "test_float: " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_var()
Next : e = MilliSecs() - s
Print "test_var:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_constmath()
Next : e = MilliSecs() - s
Print "test_constmath:   " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_empty()
Next : e = MilliSecs() - s
Print "test_empty: " + e + "ms"

s = MilliSecs() : For i = 1 To iterations
	test_ceil()
Next : e = MilliSecs() - s
Print "test_ceil:  " + e + "ms"

WaitKey() : End

Function test_int#()
	Return 7
End Function

Function test_shl#()
	Return 1 Shl 13
End Function

Function test_log#()
	Return Log(30)
End Function

Function test_float#()
	Return 7.6
End Function

Function test_var#()
	a = 7 : b = 8
	Return a * b
End Function

Function test_constmath#()
	Return iterations / 678
End Function

Function test_empty#()
End Function

Function test_ceil#()
	Return Ceil(3)
End Function

octothorpe

(Posted 2005) [#13]

NextPowerV3() is doing a binary search within each hexadecimal power.

No, it's not. A binary search (see the Wikipedia page) divides and conquers half (hence the word binary) of the possible answers each test. Your function is narrowing things down sequentially into eight ranges (what you call hexadecimal powers), then testing sequentially within.

octothorpe

(Posted 2005) [#14]

However after subtracting the loop and call overhead shown by test_empty(), I believe that there is significant variation in the other functions compared to test_int.

Keep in mind that there's going to be a certain percentage of error involved testing in a multitasking operating system. Since your machine is much faster than mine, you might want to increase your iterations to help compensate for this.

I found that whatever test I was running first was registering as slower than the rest. Adding Delay(1000) to the start of the program solved this for me. This didn't seem to be a problem for your first test's result, but I recommend you try this to see if it effects your other results.

When I run your code here, I see consistantly similar times for int, shl, float, constmath, and empty. The range was 1762ms to 1790ms.

What version of Blitz are you using? You aren't testing this with BlitzMax, are you?

PGF	(Posted 2005) [#15]

I'll maintain that it is a binary search within each power of 16. Consider that the higher half of the possible numbers in the range will be found on the first test, half the remainder on the second test, half the remainder on the third test etc.

Sounds like a binary search to me.

You are right that the search tree could be more even ala:

Function NextPowerV4:Int(n:Int)
	If n <= $8000
		If n <= $80
			If n <= $8
				If n <= $2
					If n <= $1
						Return $1
					Else
						Return $2
					EndIf
				Else
					If n <= $4
						Return $4
					Else
						Return $8
					EndIf
				EndIf
			Else
				If n <= $20
					If n <= $10
						Return $10
					Else
						Return $20
					EndIf
				Else
					If n <= $40
						Return $40
					Else
						Return $80
					EndIf
				EndIf
			EndIf
		Else
			If n <= $800
				If n <= $200
					If n <= $100
						Return $100
					Else
						Return $200
					EndIf
				Else
					If n <= $400
						Return $400
					Else
						Return $800
					EndIf
				EndIf
			Else
				If n <= $2000
					If n <= $1000
						Return $1000
					Else
						Return $2000
					EndIf
				Else
					If n <= $4000
						Return $4000
					Else
						Return $8000
					EndIf
				EndIf
			EndIf
		EndIf
	Else
		If n <= $8000000
			If n <= $80000
				If n <= $20000
					If n <= $10000
						Return $10000
					Else
						Return $20000
					EndIf
				Else
					If n <= $40000
						Return $40000
					Else
						Return $80000
					EndIf
				EndIf
			Else
				If n <= $200000
					If n <= $100000
						Return $100000
					Else
						Return $200000
					EndIf
				Else
					If n <= $400000
						Return $400000
					Else
						Return $800000
					EndIf
				EndIf
			EndIf
		Else
			If n <= $8000000
				If n <= $2000000
					If n <= $1000000
						Return $1000000
					Else
						Return $2000000
					EndIf
				Else
					If n <= $4000000
						Return $4000000
					Else
						Return $8000000
					EndIf
				EndIf
			Else
				If n <= $20000000
					If n <= $10000000
						Return $10000000
					Else
						Return $20000000
					EndIf
				Else
					If n <= $40000000
						Return $40000000
					Else
						Throw "Value too big"
					EndIf
				EndIf
			EndIf
		EndIf
	EndIf
End Function

This is the evenly balanced search tree providing an answer in 5 comparisons for all cases.

Like I said there are many ways to arrange the search tree depending on how well you know the input numbers and how finicky you want to be.

If the input numbers are uniformly distributed over 0 - 2^30 then the best tree would be:

Function NextPowerV5:Int(n:Int)
	If n > $40000000
		Throw "Value too big"
	ElseIf n > $20000000	Return $40000000
	ElseIf n > $10000000	Return $20000000
	ElseIf n > $8000000	Return $10000000
	ElseIf n > $4000000	Return $8000000
	ElseIf n > $2000000	Return $4000000
	ElseIf n > $1000000	Return $2000000
	ElseIf n > $800000	Return $1000000
	ElseIf n > $400000	Return $800000
	ElseIf n > $200000	Return $400000
	ElseIf n > $100000	Return $200000
	ElseIf n > $80000	Return $100000
	ElseIf n > $40000	Return $80000
	ElseIf n > $20000	Return $40000
	ElseIf n > $10000	Return $20000
	ElseIf n > $8000	Return $10000
	ElseIf n > $4000	Return $8000
	ElseIf n > $2000	Return $4000
	ElseIf n > $1000	Return $2000
	ElseIf n > $800		Return $1000
	ElseIf n > $400		Return $800
	ElseIf n > $200		Return $400
	ElseIf n > $100		Return $200
	ElseIf n > $80		Return $100
	ElseIf n > $40		Return $80
	ElseIf n > $20		Return $40
	ElseIf n > $10		Return $20
	ElseIf n > $8		Return $10
	ElseIf n > $4		Return $8
	ElseIf n > $2		Return $4
	ElseIf n > $1		Return $2
	ElseIf n > $0		Return $1
	Else
		Throw "Value too low"
	EndIf
End Function

That gives a lower average number of comparisons but the worst case is of course much higher.

On the timings - My testing is with Blitz2D since I don't have Blitz3D and I thought that they would be closer for basic calculations like this than BlitzMax.

Given the difference between your latest figures and mine perhaps they are more different than I thought.

Performance testing always seems to degrade into comparing the machines and language versions rather than finding out anything general and reliable about how to code for efficiency. In almost all cases I've found that Blitz performance is fantastic for the projects that I have.

Now with fast Alpha, 2D image rotation and other drawing speed improvements in BlitzMax everything is really good.

octothorpe

(Posted 2005) [#16]

Yes, in the case of a perfectly random distribution of values from 0 to 2^32-1 your second example would be faster on average than a binary search. However, it is not a binary search:

Wikipedia says: Binary search is a logarithmic algorithm and executes in O(log n) time. Specifically, 1 + log2N iterations are needed to return an answer.

I believe the point you're missing is that a binary search divides the answer set, not the problem set.

If we don't know the distribution, it's better to go for a better worst case scenario. Personally, all the values I'll be passing to this function (texture widths and heights) will be much lower than 67108864 - at which point a linear search becomes slower than a binary search. I can't think of any practical uses for this function which would use a number that large.

PGF	(Posted 2005) [#17]

I believe the point you're missing is that a binary search divides the answer set, not the problem set.

Who says ? Sounds like a statement that you just made up.

I disprove your statement with an example.

To perform a binary search of say the telephone directory, I open it to the middle and see if the name I am searching for is <= or > that position. Then I search that half etc.

This is an example of a binary search. It is dividing the problem set, not the answer set.

Here 'n' is the number of entries in the directory and the search is O(log n).

Compare this to the rather ridiculous prospect of searching the telephone directory entry by entry from the beginning to the point where you find a match. That would be the 'linear' search approach and performance would be O(n).

Now change the telephone directory to the set of possible input numbers to our function.

In the correct interpretation, 'n' is the number of possible numbers - lets say it is 2^32 although this isn't really correct as the valid range is really 2^30 once you take out negative numbers and those > 2^30 for which the answer 2^31 wraps around in a signed integer.

All of my examples are O(log n) or better.

NextPowerV4() appears to do considerably better. As does Floyd's NextPowerOfTwo(). It is possible to do better than O(log n) because of course this isn't a general search but something open to calculation.

Both of these routines achieve something like O(log(log n)) for best, worst and average cases which is quite impressive.

I stand by my statement about NextPowerV5() that "If the input numbers are uniformly distributed over 0 - 2^30 then the best tree would be:". To be perfectly correct I should have said "If the input numbers are uniformly distributed over 0 - 2^30 then the best tree for the best and average case would be:".

In fact for the average case, NextPowerV5() performance is constant time - O(1) and that beats both O(log n) and O(log(log n)). If you don't get that I'd be happy to prove it.

You might argue that 'n' should refer to the number of bits of the function argument. By doing so you've already applied the log function. If that is your interpretation then I'd encourage you to use 'n' in the conventional way.

For a function F with input p we write F(p) and discuss its performance in terms of being O(1), O(n), O(log n), O(n^2) etc with 'n' being the number of possible values of 'p'. For simplicity you can do away with p and just talk about F(n) performance being O(n) etc.

As an aside - It would be possible but of course ridiculous to try to implement the function as a linear search with O(n) performance.

If we don't know the distribution, it's better to go for a better worst case scenario.

Another bold statement. It depends on what you want to optimise, the average case or the worst case. What you are stating is that if you don't know the input distribution you should optimise for the worst case. Well maybe not. Maybe you should optimise the average performance. It depends upon the application.

Going back to the original question by big10p we do have an idea about the input distribution - "... given a random integer value ...".

octothorpe

(Posted 2005) [#18]

In the correct interpretation, 'n' is the number of possible [inputs]

Ahh. That makes sense. For some reason I always thought of n as the number of possible outputs. Thanks for setting me straight. I guess V5 is a binary search! :)

Another bold statement.

That was an aside: as I already admitted, if we know that the distribution will be perfectly random, we'd be better served by V5 than V4.

If we don't know the distribution of inputs or the performance expectations, I maintain that it's best to offer reasonable and similar performance for all inputs. This cuts down on difficult to find performance degradation in production code. Optimization can come later (after profiling), if it's necessary at all.

In fact for the average case, NextPowerV5() performance is constant time - O(1)

Now you're just being silly.

PGF	(Posted 2005) [#19]

Finally - Our thoughts are as one. I even agree with your words on optimising later if at all and certainly not without profiling.

Now you're just being silly.

No. It really is O(1). Are you a betting person ? Warning - check what you understand O(1) means before accepting a bet.