Floating point accuracy/error

BlitzMax Forums/BlitzMax Programming/Floating point accuracy/error

ragtag(Posted 2006) [#1]
I'm trying to implement a Runge Kutta integrator and have been encountering some weird floating point inaccuracy with my deltaTime. This simple example can demonstrate it.

' Floating point error.
Type TBob
	Field number:Float
End Type

Local bob:TBob = New TBob
bob.number = 0.1

Print "bob.number is "+ bob.number

' Returns:  bob.number is 0.100000001


This prints 0.100000001 when it should just print 0.1. Why is this happening? And does anyone have a solution for it?

Ragnar

p.s. The tutorial and code I'm using as reference for the Runge Kutta stuff is in C, and there all numbers are written with an f after them, like 0.5f. Since I don't know C, I figure it's some sort of indicator that it's a floating point number...anyone know.


H&K(Posted 2006) [#2]
Ok, any floating number is a power of 2 +- stuff. The stuff tho cannot be continuious, there is a limit to how small a number can be added or subtracted.

So if I have (For example 24 bits) to represent a number I could count from 0 to 16777215. If now I want to add 1/2, I cannot, because the smallest number I can have is a 1.
If I wanted to add 788665 I could. (well I couldnt because that would overflow my 24 bits)

This always happens with float. You cannot garentee that the latter digits in any number are accurate at all. Thats why people talk about degree of accuracy. As you can see 9 didgits of accuracy is to much to expect for a float

My understanding is that this is so for all languages, so whatever you are coping from C must have had to have delt with this


ragtag(Posted 2006) [#3]
Thanks for the info.

After experimenting a bit, I figured out a bit better how they work. The 9th digits generally get's rounded, no matter if it's before or after the decimal point. So 12345678.1234 becomes 12345678.0. The same goes for 0.123456789, which becomes 0.123456791...though why it adds 1 at the end I have no idea.

I'm still a bit baffled at why it turns 0.1, which are just 2 digits, into 0.100000001, but found that it does so in C too...though slightly different than in BlitzMax. Adding 0.1 to 0.0 in C and BlitzMax returned this.

BlitzMax
0.00000000
0.100000001
0.200000003
0.299999982
0.399999976

C (xCode)
0.000000000
0.100000001
0.200000003
0.300000012
0.400000006


But since the C code I have works the way it should, and mine doesn't....I guess the problem is with my code. :-)

So if anyone wants to give me some pointers, the tutorial I'm working from is here. http://www.gaffer.org/articles/Integration.html (code downloadable at bottom of page) and my code is below...

My code seems to behave similairly, but doesn't stop running till time has reached about 17000, while the C version stops at 24. :-(

Rem
	Runge Kutta order 4 integration
End Rem

Type TState
	Field x:Float
	Field v:Float
	Field dx:Float = 0.0
	Field dv:Float = 0.0
	
	Method Acceleration( time:Float )
		Local k:Float = 10.0		' Stiffness/Tightness
		Local b:Float = 1.0		' Dampening
		Return -k * x - b * v
	End Method
	
	Method EvaluateOne:TState( time:Float )
		Local output:TState = New TState
		output.dx = v
		output.dv = Acceleration( time )
		Return output
	End Method
		
	Method Evaluate:TState( time:Float, deltaTime:Float, derivative:TState )
		Local tempState:TState = New TState
		tempState.x = x + derivative.dx * deltaTime
		tempState.v = v + derivative.dv * deltaTime
		
		Local output:TState = New TState
		output.dx = v
		output.dv = tempState.Acceleration( time + deltaTime )
		Return output
	End Method
	
	Method Integrate( time:Float, deltaTime:Float )
		Local a:TState = EvaluateOne( time )
		Local b:TState = Evaluate( time, deltaTime * 0.5, a )
		Local c:TState = Evaluate( time, deltaTime * 0.5, b )
		Local d:TState = Evaluate( time, deltaTime, c )
		
		Local dxdt:Float = 1.0/6.0 * ( a.dx + 2.0 * ( b.dx + c.dx ) + d.dx )
		Local dvdt:Float = 1.0/6.0 * ( a.dv + 2.0 * ( b.dv + c.dv ) + d.dv )
		
		x = x + dxdt * deltaTime
		v = v + dvdt * deltaTime
	End Method
End Type


Function Main()
	Local state:TState = New TState
	state.x = 100.0
	state.v = 0.0
	
	time:Float = 0.0
	deltaTime:Float = 0.1
	
	While ( Abs( state.x ) > 0.001 ) Or ( Abs( state.v ) > 0.001 )
		Print time+"    "+state.x+"     "+state.v
		state.Integrate( time, deltaTime )
		time:+ deltaTime
	Wend
End Function

Main()



Cheers,

Ragnar


H&K(Posted 2006) [#4]
ok, its trying to store 0.1 as 1 over 2^X+Y, and unfortuanatly 0.1 isnt in this sequence.

What you have to remember is that the computer is trying to make all these numbers out of 01010000101, etc. So it decides how much "00001" is worth (ie 1/256) then adds them together to "approximate" a decimal number

As to the baffaled bit. Thats because you are thinking as 0.1 as being 1/10 of 1, as apposed to 5.96046483e-8 * 167721 (Which is a lot closer to how the computer thinks of it)

I am very very supprised that you didnt know this already, and probably you did know it, but just werent rcalling it. no?


GfK(Posted 2006) [#5]
Use doubles instead of floats.


H&K(Posted 2006) [#6]
When you make them doubles. Remember that doubles also have this not exact thing as well, just to a lot more didgits.


ragtag(Posted 2006) [#7]
H&K: Thanks for the explaination. I did know that they were made of bits and could only be so many digits, but I never really encountered this problem before. Thanks for the explaination, it makes perfect sense.

I'm neither a programer nor a mathematician, so I'm going a bit over my head messing with 2D physics and stuff....but it's fun, and I figure I've got to learn it somehow anyway. :-)

I did find the error in my code. It seems Wend in BlitzMax does not support Or in the expression, but instead of returning an error, it just ignores the second condition. (feature request...maybe)

Cheers,

Ragnar


H&K(Posted 2006) [#8]
While wend does support OR
Local x =1 
Local y = 0

While x=1 Or y =1

Print x+"  "+y
x=Abs (x-1)
y=Abs (y-1)

Wend



ragtag(Posted 2006) [#9]
You're quite right. I guess it's back to the drawing board for me. :-)

At least I've got the Runge Kutta thing more or less working, now I've got to implement it in the 2D physics stuff I've been working on, so I can get stiffer springs with greater dampening without having them explode on me.

Cheers,

Ragnar


ImaginaryHuman(Posted 2006) [#10]
In binary you can only divide things in half each time. 0.1 doesn't exactly divide into 1 by 1/2. It also doesn't exactly divide into 0.5, or 0.25, or 0.125 or whatever. So it has no choice but to introduce an error.


GfK(Posted 2006) [#11]
In binary you can only divide things in half each time. 0.1 doesn't exactly divide into 1 by 1/2. It also doesn't exactly divide into 0.5, or 0.25, or 0.125 or whatever. So it has no choice but to introduce an error.
What?!


H&K(Posted 2006) [#12]
lol @ Angel

Mine was more correct "0.1<>y/x^2", but I'm going to use yours to explain it next time

@GFK

Hes trying to say that when you add together any numbers in the series 1/2, 1/4, 1/8, 1/16, 1/32 etc, you cannot get 0.1


Rck(Posted 2006) [#13]
Not to sound mean..

(And ragtag seems to have gotten it explained well enough)

Floats, longer floats (doubles) etc. can only store sums of powers of two.

They cannot store every number in your wildest dreams; they even have upper and lower limits. And there isn't an exact intuitive way to determine the amount of error you come across when dealing with those kinds of variables. You need to use tolerances when dealing with floats. This kind of thing isn't told enough: ints and floats need different kind of code to handle them if you want to behave as expected as much as possible.

I.E.
if floatVal# = 0.327 then

should be more like
tol# = 0.0005
if abs(floatVal# - 0.327) < tol# then


as demonstrated in




GoodLuck with those imprecise things, and do use doubles now that Max has them.


ImaginaryHuman(Posted 2006) [#14]
Floats and Doubles are decimal BINARY numbers only. Regardless of what they represent or in what base you're interpreting them, you can only ever toggle a bit on or off.

So let's say the the floating point format has 8 bits to the right of the decimal point with which to represent numbers. The bit values for each column, as an exact fraction, will be:

%00000001 = 1/256th
%00000010 = 1/128th
%00000100 = 1/64th
%00001000 = 1/32th
%00010000 = 1/16th
%00100000 = 1/8th
%01000000 = 1/4th
%10000000 = 1/2th

So then, the only way to represent a number is to switch one of these bits on or off. You might tell, therefore, that this does not give you as much accuracy as you might desire.

For example, there is absolutely no way that, by setting a bit in any combination of these columns, that you can represent 1/3rd. You could get pretty close but you only have 8 bits to play with. Beyond the closeness you can get with the 8 bits you basically discard any other bits you might need and round the number. The rounding is an error, an incuraccy in the actual number.

Numbers with infinite decimal places like 1/3 cannot ever EVER be accurately represented by a number system that is based on multiples of 2, or 1/2's. Binary cannot represent a third of a value. For other numbers, there are also lots of occasions where you try to put an exact number in, but the float or double only get `so accurate` before reaching the limit of how many bits it has available.

Even though floats and doubles have a very large range of values that you can represent, within which you CAN represent SOME numbers with exact accuracy (like 0.5), or any Integer, but there are also lots of other numbers that it just can't map to accurately.

0.1 cannot be accurately represented in binary. You can do 1/8th but not 1/10th. That's why you see it round up the last digit to a 1 - it's probably some fraction of a bit that it can't represent any more closely, mainly because 0.1 has too many decimal places in binary.

You can increase the accuracy by using a Double which basically just gives you extra bits to work with, so the number is `chopped off less` and at a smaller measure, so any error is much less, but still some in most cases.

Since you can't easily tell how much a number will be rounded up or down, you have to start thinking of using a `tolerance`, so if the number is `within` such-and-such an amount of the number you want, then it's considered equal, otherwise considered non-equal. The only time you can do a real `=` between two floats or two doubles is if you are storing the exact same number into it. If you try writing a constant into it it will be transformed as it becomes a floating point number and comparing that to the original constant will potentially be off.

You could consider using fixed point numbers and do your own handling of them, but even they wont represent 0.1 as 0.10000000.


Chris C(Posted 2006) [#15]
Theres a wrapper for a quad percision lib on my site, there may also be a wrapper to a true decimal (bcd) lib there too
I cant remember what I did with it...

Hope that helps


skidracer(Posted 2006) [#16]
Floats and Doubles are decimal BINARY numbers only.


There is nothing decimal about floats and doubles except the way they are converted to Strings when displayed.

IMHO BlitzMax should have a friendlier mode for this conversion process that rounds the output to seven significant digits as the .00000001 added in the top output includes meaningless error for the sake of exact machine translation when converting back from string to float.

Currently the only option in BlitzMax is the first form in the following C example:
#include <stdio.h>
int main(){
	float a;
	a=1.0/10.0;
	printf("a=%#.9g\n",a); //machine exact decimal output
	printf("a=%f\n",a);	//human friendly decimal output
	return 0;
}



Floyd(Posted 2006) [#17]
First, the original problem that started this thread is that the translation from C++ to BlitzMax went wrong somewhere. It's not about floating point accuracy at all.

Still, it would be nice to have some control over formatting.

The extra digits are certainly needed for debugging. I've seen endless confusion caused by the six digit display in earlier versions of Blitz. Some code says two numbers are different, but Print or the debugger says they are the same. Six digits just aren't enough to tell them apart.

Here's a little demonstration that eight digits are also not sufficient.

NOTE: I have replaced faulty example code, which didn't really illustrate the problem.
Local x# , y#, xx!, yy!

x = 12345.0205
y = 12345.0215

xx = x
yy = y

Print
Print "Here are x and y with plenty of extra digits."
Print

Print " x = " + xx
Print " y = " + yy
Print
Print " x and y would both look like 12345.021 with an eight digit display."
Print
Print " But with nine digits..."
Print

Print " x = " + x
Print " y = " + y



Floyd(Posted 2006) [#18]
I was just at Project Gutenberg and can't resist mentioning this.
After a title search the results page included:

Processing time: 2.5206739902496 (1.2642049789429 + 1.1336920261383 + 0.12277698516846) seconds

Now that's accuracy!


H&K(Posted 2006) [#19]
Only if it was right. What did you time it against to be sure?