Sqr
BlitzMax Forums/BlitzMax Programming/Sqr
| ||
Hi, I'm using a lot of these. They take and return a double but I only need single precision. Any way to speed it up? (I dropped Cos/Sin for arrays and got a decent speed upgrade) The range of values is from 0.0 to 1500 Any ideas? Jim |
| ||
Standard optimisation questions: Is this function actually taking up an unreasonable amount of time? Is it the highest priority target for optimisation? Can you make an algorithmic change instead (e.g. move the call out of a loop body)? Have you actually profiled the code? (There is no point making optimisations based on guesses about efficiency, guesses are always wrong.) Having checked off all of the above... you'll hit the standard answer of "it depends what you want to do". How are you using the call? I doubt you'll find a solution that preserves the accuracy of answers and the generality of use, or the function would already use that method, so it depends on which tradeoffs you can afford to make. EDIT: sorry, my above comment is stupid, I didn't bother to actually answer the question! - to get a single-precision version of Sqr, just grab the other one from the C math library: Extern Function SqrF:Float( x:Float )="sqrtf" End Extern Print SqrF(36) The library itself is being imported anyway to provide the normal version of Sqr. |
| ||
Be sure you actually need Sqr. The most common use is to calculate distance or length. If you really need these values the Sqr is necessary. But it is often the case that you are only comparing such values, such as to find out which object is closest by comparing distances. Then you can skip Sqr and just compare the squares of the distances. This reminds me of the early days of Blitz3D. It uses single precision, as does Direct3D. The FPU was thus in single precision mode, doing less precise calculations and taking a little less time. That speeds up arithmetic, i.e. addition, subtraction, multiplication and division. It also speeds up square roots. But all the other mathematical functions, such as Sin ATan and Log, are unaffected. Those are calculated in their full 80-bit glory in any case. As I recall the single precision speedup was modest, something like 15% for the affected operations. Of course it was 0% for everything else. |
| ||
If I use Sin often and don't need ultimate precision, I pre-calculate an array with Integer Degrees (0 to 359) as index and Single Float output. |
| ||
Thanks folks, I definitely cannot avoid the sqr as much as I am doing. I'm in the process of 'making the game go faster' so it will play at a decent frame rate on 'older' pcs. I'll try the SqrF method and see ... Thanks Jim |
| ||
ok, no difference with SqrF the line of code is (cos_start_ang# + (Sqr(vx#*vx# + vy#*vy#)/divisor#) * cos_length# ) I don't think I can simplify Jim |
| ||
ok, no difference with SqrF As far as I know there is literally no difference. On a PC the sqrtf function is really just a call to the usual sqrt. This may not be true on other platforms. It is possible to set the FPU to single precision mode and maybe squeeze out a little extra speed. Multiplication is faster than division. If you are doing many divisions with divisor# not changing then you are better off storing the reciprocal with rdivisor# = 1.0/divisor#. Then replace /divisor# with *rdivisor. |
| ||
Thanks Jim |
| ||
Care to give us more code or context to work with? |
| ||
This is what I use for a fast square root. It's good enough for distance checks, but if you use it for drawing you might see little bit of warping.inline float fastSqrt(const float x) { union { int i; float x; } u; u.x = x; u.i = (1<<29) + (u.i >> 1) - (1<<22); return u.x; } |
| ||
GW, thanks, but how do I use this in BMax? Jim |
| ||
You need to have Mingw installed to compile the C code. Bmax will link the function when it compiles. Look at the docs under 'Language'->'Advanced topics'->'Interfacing with C' |
| ||
Thanks GW, gained a little bit of speed. Looking at it I don't understand how it works? Jim |
| ||
The full explanation is here Jim: http://en.wikipedia.org/wiki/Fast_inverse_square_root Often called the Carmack square root because of its use in Quake, but it's actually older than that. I love the commenting of the original code - very, very apt :) |
| ||
It exploits the way floating point numbers are represented with an exponent and mantissa. Roughly speaking, the exponent is divided by two by shifting one bit to the right, as is the fractional part of the mantissa. The result is exact when x is an even power of two: 2^0, 2^2, 2^4... = 1, 4, 16... It is most inexact (error about 6%) for odd powers of two: 2^1, 2^3, 2^5... = 2, 8, 32... |
| ||
Interestingly that article also says (I am content to take their word for it) that this Thing of Beauty is still slower than using SSE's square root instruction. What are the odds of your code being run on something lower-end than a Pentium III? If it's a game made in this day and age it almost certainly needs better than that anyway, so perhaps an SSE version is the best option? (One wonders why libc wouldn't do this already..?) |
| ||
I'm mildly surprised that a BlitzMax version of fastSqrt runs about 90% as fast as the C version.Function fastSqrt:Float( x:Float ) Local fp:Float Ptr = Varptr x Local ip:Int Ptr = Int Ptr fp ip[0] = ( ip[0] Sar 1 ) + $1FC00000 ' magic number is (1 Shl 29) - (1 Shl 22) Return fp[0] End Function |
| ||
Relating to another topic on here, pasting that code directly into where its used instead of having a separate function should be considerably faster. EDIT: I did a quick test, and a function relying on the SSE square instruction is about as fast as inlining the above code. AVX was about ~50% faster still, but then you'd narrow it down a bit too much :) You would have to redesign your code though, since youll need to pass 4 values at once. There are usually better ways to speed up this kind of code, which is why i asked for a better explanation on what its doing earlier. |