sine and cosine

BlitzMax Forums/BlitzMax Beginners Area/sine and cosine

Jesse(Posted 2006) [#1]
does anybody know if it is faster to implement an array of precalculated sine and cosine? Or would it be better to just use them directlty in equations. I am using them directly in my game. I want to know if I would be better of leaving them as is or would it increase my game speed otherwise.


tonyg(Posted 2006) [#2]
Pre-calc1
pre-calc2


rdodson41(Posted 2006) [#3]
I just use Sin and Cos directly in the game and even when going 60 times a second for multiple objects it works plenty fast.


Grey Alien(Posted 2006) [#4]
I used to always precalc in the old days but some recent tests revealed almost no speed up whatsoever. The main delay was in all the other code like drawing etc.


Jesse(Posted 2006) [#5]
I was going to take general opinion(lazy me) but sence I got mixed response I decided to test it myself. this is the program I tested it width:



it trully is insignificant compared to the number of cycles it had to perform. on my Athlon xp 2900+
I got 16 millisec VS 50 millisec difference 60000 operations in half of second. I can live width that.
Thanks everybody.


H&K(Posted 2006) [#6]
But why live with it?

You've effectivly just written the replacement for Cos in, what two mins? Max.

The question of cos sin directly or in an array is normaly "Has the language already made it a look up table". You have shown that it either isnt a lookup table, or at the least that its a lot slower for whole angles.

Just change the array to FastCos[] or something, and you have made cos 320% faster. For two mins work.

Also, if you read those two threads, one of the posts was about how cos returned different results on different comps. So if that is true, and you implement their incbin data idea for the .. err data, you have a "safe" cross platform value


wedoe(Posted 2006) [#7]
Yep, optimisation is a good way of thinking, why go with the
slowest approach when you already have made a better replacement ?

One cycle here and one cycle there, along the way the
little streams become a river (of optimisation) !


Jesse(Posted 2006) [#8]
that settles it. Optimized it is!. Lesson perfectly understood now.
Thanks.


FlameDuck(Posted 2006) [#9]
Lesson perfectly understood now.
To have been lessoned so significantly, and have learned so little.

You have shown that it either isnt a lookup table, or at the least that its a lot slower for whole angles.
You left out "on your hardware". Someone with a fast processor and slow and/or underpowered RAM (like almost every laptop for instance) is not going to be too impressed.

Besides which the benchmark above is getting false positives due to compiler optimization. Observe:



Jesse(Posted 2006) [#10]
well, Lets not guess, and do actual tests. Ccan some volunteer run it in their computers. and show their result with their. I am hoping for some laptops and and small end computers. Because frankly seeing is believend and everything else can just take a walk :). How about it any volunteers. I bet results are relatively similar!
how about it any volunteers?


Jesse(Posted 2006) [#11]
also in your example is roughly 30+ milliseconds difference. which does not prove much to me.


FlameDuck(Posted 2006) [#12]
It's not a matter of how many millisecs difference there are, it's a matter of how much of an improvement is achieved by using memory access over FPU instructions. All other things equal, my example illustrates that using look-up tables is only actually a marginal improvement rather than 3 times as fast.

Your example uses the same index all the time, thus allowing on-die registers to be re-used for all look-ups (so it only accesses memory once), rather than reading from memory every time. This means that the Cos/Sin verion is doing 8 times the work, and thus is (possibly unintentionally) designed to give the impression that memory access is faster than FPU instructions.

My example is still far from a flawless benchmark, it's purpose is simply to illustrate the flaws inherent in yours.

which does not prove much to me.
You're entitled to write your program in whatever fashion you think is best. You asked a question, and I responded - believe what you will. If you cannot see how a 3x increase in speed is different from a 1.2x increase in speed, there is not likey anything I can do to convince you otherwise.


fredborg(Posted 2006) [#13]
Actually for the test to be (more) fair, sin and cos takes a floating point parameter while an array takes an integer. Therefore the values passed to them should be in the right format to avoid converting from float to int or vice versa.
Local pcos#[360]
Local psin#[360]

For x = 0 To 359
	pcos[x] = Cos(x)
	psin[x] = Sin(x)
Next

sec = MilliSecs()
For x = 0 To 60000
 	n# = pcos[Rand(0,359)]/psin[Rand(0,359)]*pcos[Rand(0,359)]/psin[Rand(0,359)]+pcos[Rand(0,359)]/psin[Rand(0,359)]/pcos[Rand(0,359)]/psin[Rand(0,359)]
Next
lapse = MilliSecs() - sec 
Print lapse	

sec = MilliSecs()
For x = 0 To 60000
	n# = Cos(Rnd(0.0,359.0))/Sin(Rnd(0.0,359.0))*Cos(Rnd(0.0,359.0))/Sin(Rnd(0.0,359.0))+Cos(Rnd(0.0,359.0))/Sin(Rnd(0.0,359.0))/Cos(Rnd(0.0,359.0))/Sin(Rnd(0.0,359.0))
Next
lapse = MilliSecs()-sec
Print lapse
This on average gives 77ms vs. 97ms on my laptop. But if you have something that takes less than 100 millisecs to do 480008 calculations/lookups, there is absolutely no point in optimizing, unless you will be using it excessively. Especially when you consider that you've effectively lost a lot of precision by using a lookup.


H&K(Posted 2006) [#14]
@Flame

Can you shed any light on this thing about different comps giving different results for cos and sin.
Because if that does happen then the reason for lookup tables increases.

Also I dont think you worded "uses the same index all the time" correctly, It uses the same index for all calculations each cycle. (A small point, but when I first read what you had written I couldnt see what you ment)

I think that 360 floats for each of the trig functions, is an arbitary amount of memory, and Im very supprised you used it. (Its less that a 512 By 1 Texture)

I dont doubt that in a real situation the 320%, (Which I quoted as the speed increase from the data given), would be much less. But it would be an increase.

However Jessie, wanted to know which was faster. And the tossup is, an arbitary increase in speed in exchange for an arbitary increase in size/memory. Yet even after jessie showed this increase in speed, Decided not to use it.

My point still remains. If its speed rather than memory that you are worried about, and you have already written the speedier version. Then use it.

Unless, you are saying that it isnt quicker? (As apposed to, it isnt as much quicker as we think)

@Fred

The lookup doesnt provide a result with any lack of precision. (But I know what you mean, which is why I did say for whole angles in my post)


Abomination(Posted 2006) [#15]
In those examples the random-calculation take up most of the time, so that "levels" the score quite a bit.

Local pcos#[360]
Local psin#[360]
Local ra#[10001,8]
For x = 0 To 359
	pcos[x] = Cos(x)
	psin[x] = Sin(x)
Next

For x=0 To 10000
For y=0 To 7
ra[x,y]=Rnd(0.0,359.0)
Next
Next

arraytime=0
mathtime=0

For y=0 To 10
sec = MilliSecs()
For x = 0 To 10000
 	n# = pcos[ra[x,0]]/psin[ra[x,1]]*pcos[ra[x,2]]/psin[ra[x,3]]+pcos[ra[x,4]]/psin[ra[x,5]]/pcos[ra[x,6]]/psin[ra[x,7]]
Next
lapse = MilliSecs() - sec 
arraytime=arraytime+lapse

sec = MilliSecs()
For x = 0 To 10000
	n# = Cos(ra[x,0])/Sin(ra[x,1])*Cos(ra[x,2])/Sin(ra[x,3])+Cos(ra[x,4])/Sin(ra[x,5])/Cos(ra[x,6])/Sin(ra[x,7])
Next
lapse = MilliSecs()-sec
mathtime=mathtime+lapse

Next

Print arraytime
Print mathtime



20 ms vs 80 ms

would compiler optimization still be an important factor in this test?


fredborg(Posted 2006) [#16]

The lookup doesnt provide a result with any lack of precision.
It does if you aren't using whole angles.

@Abomination: As FlameDuck described the speeds depend on which processor/memory combo you have. On my machine at work (P4 3.6GHz 1Gb ram) your test is 27 vs 53, while my test above is 65 vs 79.

And if you want to do a lookup table for random numbers as well, be my guest, but in my honest opinion it (lookups for simple operations) is a pointless excercise. But one that pops up at regular intervals :)


H&K(Posted 2006) [#17]
It does if you aren't using whole angles

Did you realy need to say that when I had posted
The lookup doesnt provide a result with any lack of precision. (But I know what you mean, which is why I did say for whole angles in my post)

But No, even if you are using half angles the pressision is the same, if you make the Array twice as big. You are obviously correct that if you want Cos(12.304432343), then the result will not be correct. But it would not be correct to exactly the same level of precision

I dissagree about it being pointless, It take hardly any time at all to do, and it does when all said and done work faster. There are obviosly reasons not to do it, as you have said, if the range of angles you want is continuos rather than descrete. But to dismiss it out of hand is wrong.

The most Important thing is Do cos and sin give different results on different comp. Because then we should use lookups. (I dont use lookups ATM, but thats only because Im too lazy to setup a decent include file)


wedoe(Posted 2006) [#18]
Even the con-people on the thread admit lookups are faster,
the disussion is "how much". I don't care ! Faster is faster
and I cannot see why I should make "lesser" code just
because someone think I should. It's not North Korea is it ?


fredborg(Posted 2006) [#19]
Offering my advice on a subject does not mean anyone have to agree or follow it, I am merely expressing my own opinion, as are everyone else.


FlameDuck(Posted 2006) [#20]
Can you shed any light on this thing about different comps giving different results for cos and sin.
No. I can only speculate that it is related to different FPU accuracy modes. Running long enough, I could see how accumulated errors can add up over time, to produce significantly different results - particularly if numbers are moved back and fourth between CPU registers / memory (32 or 64 bit precision) and the FPU (80 bit precision).

I think that 360 floats for each of the trig functions, is an arbitary amount of memory, and Im very supprised you used it.
It's not really a matter of size, so much as a matter of access speed. When a CPU wants to access a page, it requests it from the L1 data cache (super fast - but small). If it is not avaiable there it tries the L2 Cache (still reasonably fast, and relatively large). If it can't find it there either, it will hit the main memory (this is slow), and eventualy the harddrive (really bad). The more cache misses, the more significant the time penalty.

The FPU instruction on the other hand simply needs to have it's registers loaded with the proper values, and told to execute. On a pipelined superscalar CPU (any Pentium Pro or Pentium II class CPU or higher) this instruction can be executed in paraallel with ALU code as can multiple FPU instructions if more than one FPU exists (Pentium 4 and AMD64).

This is not the case for look-up tables which rely on memory access and cannot normally be executed out-of-order.

And the tossup is, an arbitary increase in speed in exchange for an arbitary increase in size/memory.
That's currently the truth for most algorythms. Use memory or use processing power. However processing power and semi-conductor density is increasing exponentialy over time, where as (affordable) memory increases in speed and capacity in a much more linear fashion.

Unless, you are saying that it isnt quicker?
I'm saying it depends on the hardware. Specificly it depends on CPU architechture, northbridge bandwidth (and traffic) and memory speed. It also depends in large parts on whether the code in question is cache friendly or not.

In those examples the random-calculation take up most of the time, so that "levels" the score quite a bit.
That was the point. The random calculation overhead however is not (or rather should not) be a significant factor, because it executes at a constant speed, so any overhead is applied equally to both tests.

would compiler optimization still be an important factor in this test?
Not so much compiler optimizations, as cache optimizations. Computers are very, very good at doing in-order processing, and your look-up tables are just that. However games in particular are generally not very cache friendly programs as very little game logic is executed 'in-order' - or at least 'in-the-same-order' and this is why benchmarking is such a horrible, horrible task.

Also keep in mind that for a simple application like this, you pretty much have unrestricted access to close to the full bandwidth of the northbridge. In a game where you're copying vertice lists, texture data and what not back and fourth between system memory and video memory, this would not nessecarilly be the case.

In either case, using Fredborgs example gives me about a 79/99 score (which isn't surprising since our computers are similar) when the computer is idling. However keeping the computer (more specificly the north bridge) busy by watching a film clip, playing some Civilization IV, or defragmenting the harddrive, the array version increases in time to execute much more rapidly than the FPU version does.

There are obviosly reasons not to do it, as you have said, if the range of angles you want is continuos rather than descrete.
Is this a bad time to point out that Sin and Cos aren't discrete functions? And that when you need to get an angle (using ACos, ASin, or ATan) you will not get a discrete number, and the accuracy loss that provides is much more significant than any real or imaginary loss of precision do to IEEE FP accuracy errors. These integer conversion precision loss is much more severe, and will almost certainly give obvious visual artefacting, particularly in high resolutions

Even the con-people on the thread admit lookups are faster,
You've been reading this discussion like Satan reads the bible I see. Lookups for simple FPU instructions are only faster if you specificly design your benchmark to prove it, by using compiler or cache optimizations, and/or have an older CPU. This presents a worthless benchmark for any type of program that requires constant interaction and feedback (like say a videogame). If you're filling up your northbridge with other stuff aswell (like say graphica), your lookup tables are going to be even worse off.

I don't care ! Faster is faster and I cannot see why I should make "lesser" code just because someone think I should.
No. Quite clearly you're entitled to your opinion, even when it's wrong.


H&K(Posted 2006) [#21]
Is this a bad time to point out that Sin and Cos aren't discrete functions?


Dont know ;(

It was obviously an important point to make. And one that I did indeed need reminding of.
Note To Self: All numbers on the computer are discrete in nature


If I rephrase it to
There are obviosly reasons not to do it, as you have said, if the range of angles you want is Qusi-continuos rather than discrete or
There are obviosly reasons not to do it, as you have said, if the range of angles you want is convergent with float rather than convergent on whole numbers It would make me feel happier.

I think probably the answer is, given how simple the task is
When the program is finished, make two versions. One with cos/sin directly. and one with lookup tables. And see which is quicker in a real enviroment

Edit: This use of Bold isnt shouting. I admit it looks like shouting, but it was either that or underline. And under line looks too definitive for my liking


Jesse(Posted 2006) [#22]
I will eventually complete my game and I will make two versions one with sin & cos and one with lookup tables. Call me staburn and narrow-minded but I have leaned toward look up tables for the moment. I have an executable of such at the moment. I will wait a while before attempting to make a sin & cos version. If anybody is interested I can email them an executable of what I am working on. I don't have a place to host it. so if anybody is interested in hosting it let me know and I willl send you a copy of my silly game which is about 30% complete. I borrowed sounds from other games. the program is sin and cos intense which is why I am interested in faster alternatives. bear in mind that I consider myself a beginner and as such don't spect to much. The rar file is about 1 meg.


wedoe(Posted 2006) [#23]
No. Quite clearly you're entitled to your opinion, even when it's wrong.
I like you FlameDuck, your insights are much valued by me, but remarks like that are pure stupid !


Defoc8(Posted 2006) [#24]
You really need to know a few things - what code is being
generated by the compiler, and lots of information about
the system its running on... personaly i wouldnt know were
to start with modern processors..they seem quite complex
...[even ignoring operating systems..erg..]

Some things still work quite well in many cases - like
collapsing loops..were you repeat the code n times
instead of for nexting through the whole thing..

replacing function calls with macros, avoiding the function
call overhead in critical sections..

These are very general - and like anything else, its context
depedant..dont assume that anything is an optimisation.

anyway - im no expert, i used to like coding in 68k
assembler, but thats a while ago now :] :p

FlameDuck seems to know his chips, might be worth listenting to the man.


FlameDuck(Posted 2006) [#25]
anyway - im no expert, i used to like coding in 68k assembler, but thats a while ago now
Ah, an elegant processor from a more civilized time. Not as clumsy or random as an Intel.

Just bring out consumer level VLIW processors and Unified Memory Architecture busses already. Please.


Brucey(Posted 2006) [#26]
Ah, an elegant processor from a more civilized time. Not as clumsy or random as an Intel.

Haw ;-)

@Jesse
I think creating two apps using different methods is a tad overkill... considering how little difference there is between the two methods, you could probably choose either and noone would ever notice the difference.


FlameDuck(Posted 2006) [#27]
I like you FlameDuck, your insights are much valued by me, but remarks like that are pure stupid !
What? As opposed to remarks like: "It's not North Korea is it", which are big and clever?


teamonkey(Posted 2006) [#28]
If you want to use non-discreet values, then for a look-up table you'd have to do something like this:
Function array_cos:Double(x:Double)
	Local a:Int = Floor(x)
	Local b:Int = Ceil(x)
	
	While a<0
		a:+360
	EndWhile
	
	While b>359
		a:-360
	EndWhile

	Local fa:Double = pcos[a]
	Local fb:Double = pcos[b]
	
	Local dx:Double = x - Double(fa)
	
	Return(dx*fa + (1.0-dx)*fb)
EndFunction

...Which interpolates between the two values. It's much better than using the nearest discreet value.

Using a LUT array directly is about twice as fast as Cos() on my laptop (probably largely due to L2 caching), however this function is about 5 times slower. As the use of Sin/Cos using discreet inputs is extremely limited, it's hard to justify not using the Sin()/Cos() functions.


The most Important thing is Do cos and sin give different results on different comp. Because then we should use lookups.

Why?


H&K(Posted 2006) [#29]
The most Important thing is Do cos and sin give different results on different comp. Because then we should use lookups.


Why?

If you are just using trig functions to push graphics around, then Ok my assertion was over the top.
I use Sin/Cos to generate the maps for my levels from (Al la Zarch), and if the results are not garranteed to be the same, then it has a "bigger" impact than simple transformation rounding.

It was not clear from that post, (but if you look earlar), it was sudjested that the lookup table be incbined at compile time.


teamonkey(Posted 2006) [#30]
I use Sin/Cos to generate the maps for my levels from (Al la Zarch), and if the results are not garranteed to be the same, then it has a "bigger" impact than simple transformation rounding.

Unless your level generation is very recursive (or chaotic) it probably won't make any visible difference. You'll introduce more errors through transformations than you will through simple rounding errors.

In general you should only assume that a Float - any Float - is accurate to about 6 significant figures and a Double is accurate to about 10-12 (depending on your hardware). The FPU should offer at least that level of accuracy for those datatypes.

If you're in a position where 6sf matters when using trig, you're talking about accuracies of about 1/10000 of a degree. An error in 3sf, however, will affect at least about 1/100 of a degree, which is why interpolating values from a LUT sucks as the interpolation will probably give you that level of error.


Jesse(Posted 2006) [#31]
I know FlameDUck Knows his stuff and I respect his views. and I am also shure his igo is up in the clouds. also I am shure he is even tired of hearing all the positive comments people write about him...
and on top of that (my two cents) I am glad we have somebody quite skilled like him who can "help" us, beginners, when we need it, even if he has to stump on us to prove a point. I know being humble is not everyones virtue so I take them as they come.
On the other hand I am the type of persons that does not take a word for a fact. I need to try it out myself. And even though I am making my game with tables, I have contemplated in back of my mind that FlameDuck is right, even when his lessons are not reflected in my work/hobby/game. But If I ever make a professional game. I will take watever I have learned here, specially from the likes of him and use it. I know it is easier said than done, but I will try at least.

Sine and cosine, I understand, are the least of my worries. I have coded many chunks of code in my game, that when I go back and look at them, I realize could be done alot better and I have managed to gain considerable speed increase.

@brucey:
so far this is just a hobby for me. I had the game in sin/cos method. but while in the process of starting this thread I had converted it to tables. I notice it slightly faster ( or am I trying to convincing my mind of what I want to velieve).
for now I have no way to test both of them sence I did not make a back up copy of previous method (see what I mean by try to use). it is just a matter of changing about 50 lines of code. No big deal. "so far."


H&K(Posted 2006) [#32]
Unless your level generation is very recursive

It is.
But You are right.
Read my note to self.

This statment was based on my not remembering that All numbers on the computer are discrete in nature

@Jessie
"Changing 50 lines of code". Its a job for search and replace


wedoe(Posted 2006) [#33]
What? As opposed to remarks like: "It's not North Korea is it", which are big and clever?
No, you say my opinion is wrong. There's no such thing as a wrong opinion,
just a different one. Saying someones opinion is "wrong" is derogatory (and stupid).

What I claim could be wrong but that's a totally different matter.

It took you quite some time to comment what I said so I interpret you
had to think for a while before you could comment back, rightiousing yourself.

(I still like you, no hard feelings from this side of the fence) :o)


Jesse(Posted 2006) [#34]
correct H&K. For the most part any way.

if anybody wants to try out my 30% complete game here is a link to download it : click here!


FlameDuck(Posted 2006) [#35]
There's no such thing as a wrong opinion,
Yes there is. When your opinion is based on incorrect assuptions (like for instance that LUT are always faster than on-die calculations) it is wrong.

It took you quite some time to comment what I said so I interpret you had to think for a while before you could comment back, rightiousing yourself.
Well you can interpret that however you want. The truth is I missed it the first time around.

I know FlameDUck Knows his stuff and I respect his views. and I am also shure his igo is up in the clouds. also I am shure he is even tired of hearing all the positive comments people write about him...
Is that an insult or a compliment? Oh and my 'igo' is not up in the clouds.

I'm sorry if you find my direct approach to communication intimidating.


wedoe(Posted 2006) [#36]
Well you can interpret that however you want.
Thank you ;o)

BTW: You're not at all intimidating, but you sometimes appear a bit hostile IMHO...


Abomination(Posted 2006) [#37]
An opinion is a "personal" thing and doesn't require to be based on facts or correct assumptions. That's why everybody can have a different opinion.
Usualy saying an opinion is wrong, means you don't agree with the ethical implications.

Sarcasm is an ugly tool to undermine someones credibility; better use mild irony. It's far more sophisticated. <---mild irony. ;)


FlameDuck(Posted 2006) [#38]
mild irony.
I see Alanis Morisette has not lived in vain.


SoggyP(Posted 2006) [#39]
Hello.

Le Duc a la Flambe is correct. Unfortunately, despite the ideal that computers should present equal behaviours (however scaled) in all circumstances, the plethora of combinations of processors, ram-modules, motherboards, etc, means that you cannot categorically say one method is better than another, or at least not until you've tried all the combinations.

However, providing two executables for this one area seems a little excessive. If this is the final, last possible, ultimate optimisation you can provide then ok, but I'm sure there are other areas you can pull some cycles back from.

Include both methods in your game and let the user decide. That normally solves the problem for you, as most people will accept the default anyway, especially if they can't distinguish between ms in a frame.

Oh, and always make sure all media is available in a puki-rippable format.

Goodbye.


ozak(Posted 2006) [#40]
Before the FPUs this was important, but it hardly is today. Most FPUs can du huge number of Sin/Cos lookups without much trouble. As all the fine benchmark examples suggest.


H&K(Posted 2006) [#41]
@Flame,

You do have a tendicy, once you have explained something once, and dont think it has been understood, to become a little tiny tiny bit patronising.
I think this is just natural, and Im sure the same finger can be pointed at me, as well as several others.

There are some people who post here who's views when different to mine, do make my question my view.
And you are one of them.

This time I havent comeback with a question/point/dissagreement, because what you said seemed right to me.
If on the other hand I had wanted to question something you said, because I either didnt underestand it, or because I think it was wrong, I wouldnt want you to try and belittle me, rather to try and maitain your tutorial nature in your posts.

I assume that you gain less from these post than I, (And others) do. And you are often posting to try and change some ingrained opinion.
When doing this, surely you should expect some dissagreement with your stance, no matter how correct it is.

Please when this happens let it be as water off a ducks back ;)


Abomination(Posted 2006) [#42]
Well, a little flaming here and there livens up these threads.
It's quite funny to see that people seem to think they can "win" an argument by getting personal with "invisible" (internet) opponents.

An other thing is, that the original question still has no definite answer.
These benchmarkexamples are not representable for practical situations? (or in all circumstances?)
Can there be an example that is?
Or is it just a matter of whose opinion is most convincing?
Not very satisfying.


H&K(Posted 2006) [#43]
@Abomination

What has been decided about the original question, is that you cannot say either way.
It depends on more than you can simply test. It needs to be a real program in a real enviorment.
However because its an amazingly simple thing to change between (As said its all but search replace), Jessie has volunteered to finish his program, then to build two versions.
And then we can all see.

However even this will not be a deffinitive anwser.

I dont think flameing has a place in the programming threads. General threads yes.

Question: If I am going to use the same COS result more than once in quick repitition. Can I assume that allocating it to a local variable will put it in L1 Catch, and hence it would definatly by quicker than calculating it twice?


FlameDuck(Posted 2006) [#44]
An other thing is, that the original question still has no definite answer.
And it likely never will. It would have a definite answer on a fixed hardware platform, like the PlayStation or XBox, but not on real computers.

These benchmarkexamples are not representable for practical situations? (or in all circumstances?)
They aren't really benchmarks. From wikipedia: Benchmarking is not easy and often involves several iterative rounds in order to arrive at predictable, useful conclusions. Interpretation of benchmarking data is also extraordinarily difficult.

Can there be an example that is?
No. If there was, it would have been written by now.

Or is it just a matter of whose opinion is most convincing?
No. It's a matter of circumstance. A greater optimization, regardless of whether one uses LUTs or FPU instructions, is optimizing your algorithm (for instance to allow for out-of-order execution, or reducing time complexity).


Abomination(Posted 2006) [#45]
Yes, ofcourse, one can agree with all of that.
But...
As for the most convincing opinion:
If, as we agree, there is no definite answer, then how is the beginning BMax-user to choose between the many sugestions given here?
As for optimizing the algorithms:
That is hard in assembler (especialy in these "pipeline"days) and in a basic-language it's extreemly difficult.
How many buscycles does it take to calculate cos (220)?
Indeed that is a matter of circumstance.
In which case benchmarking (I know) a piece of code, you want to optimize "for speed", is not such a bad idea.
And comparing "single instructions" like accessing an array or calculating cos?
If, in a tight loop, on your computer, something is faster then something else, it doesn't matter wether it's because cache, LUT or FPU; at least it proofes that in some cases that instruction is the fastest.
Chances are, that in some more "extended" loop, that instruction will contribute to faster code.


FlameDuck(Posted 2006) [#46]
If, as we agree, there is no definite answer, then how is the beginning BMax-user to choose between the many sugestions given here?
In my opinion optimization on this level is superfluous. The most significant factor is code readability. So use that which you think is most readable.

That is hard in assembler (especialy in these "pipeline"days) and in a basic-language it's extreemly difficult.
No it's much easier in a more abstract language, because you can rely on the compiler to do the low-level optimization - all you have to provide is algorithms of as low a time complexity as possible.

How many buscycles does it take to calculate cos (220)?
It doesn't matter. Besides which it varies by FPU, and whether it can be executed in parallel.

In which case benchmarking (I know) a piece of code, you want to optimize "for speed", is not such a bad idea.
It is if it's already fast enough.

If, in a tight loop, on your computer, something is faster then something else, it doesn't matter wether it's because cache, LUT or FPU; at least it proofes that in some cases that instruction is the fastest.
Which is a fairly useless proof, unless you're only writting code specificly for your computer.

Chances are, that in some more "extended" loop, that instruction will contribute to faster code.
No. the larger the loop, and the more objects you have in it, the more likely you are to get a cache miss.


Abomination(Posted 2006) [#47]
all you have to provide is algorithms of as low a time complexity as possible.
ok: b=cos(a) or b=Array[a]. Wich one has the lowest time complexity?

It doesn't matter. Besides which it varies by FPU, and whether it can be executed in parallel.

Yes, that is why I said it's a matter of circumstance.

It is if it's already fast enough.
Duh!

No. the larger the loop, and the more objects you have in it, the more likely you are to get a cache miss.
Yes, the more the loop is "poluted" the less chance "your code" will be cached.
But, the cache stores copies of the data from the most frequently used main memory locations, so as long as "your code" apears relatively often in a loop the chances of a miss will be relatively low.


Chris C(Posted 2006) [#48]
I think flameduck meant code cache not data cache...

and I think he has a very good point.
The rather minimal total amount of time you will save on certain machines just aint worth the hassle.

Far better to write somthing that actually works in the real world, and then get *actual usage timings* for various routines.

Whats far more likly to help is refining the algorithm you are using or even completely changing how you are do things, which will often save much more execution time.


H&K(Posted 2006) [#49]
I dissagree chris.

The rather minimal total amount of time you will save on certain machines is worth the hasstle of what is basicaly a search/replace operation

Mind you, having said that, it hasnt been worth the hasstle for me so far. So ... well I dont know.


ImaginaryHuman(Posted 2006) [#50]
Or compare the original functions with the lookup table at runtime and decide which one to use based on the speed.


FlameDuck(Posted 2006) [#51]
ok: b=cos(a) or b=Array[a]. Wich one has the lowest time complexity?
They both have O(1) (constant) time complexity.


Warren(Posted 2006) [#52]
I don't care ! Faster is faster and I cannot see why I should make "lesser" code just because someone think I should.

Because optimizing something that doesn't need to be optimized opens the door for bugs later on. Measure twice, optimize once. If sin/cos are major bottlenecks in your game, fine, optimize away. I'm going to bet that they aren't and that there are MUCH bigger gains to be made somewhere else.