Float Operations

Monkey Forums/Monkey Programming/Float Operations

Wagenheimer

(Posted 2011) [#1]

Are Float Operations very slow on iOs?

I did have a lot of /2 float operations on my render code, and I was getting 30fps!

I changed this and pre-calculated all this operations, and now I get 60 fps!

Is this correct? I must avoid float operations on iOS?

Canardian

(Posted 2011) [#2]

According to my tests, in C/C++ the double datatype is about 2-5 times faster than float. I've made already a patch for Monkey which replaces all float with double in C++ (you still use Float in Monkey, it just tranlates to double), and it seems to work fine and I get much faster and more accurate programs. I was considering also if a direct memcpy or typecast from C++ to monkey would work with float, but it does not (the IEEE coding seems to be different), so you need to wrap all fields in a class anyway, and then it doesn't matter if you convert between float and double.

marksibly

(Posted 2011) [#3]

Hi,

There is no logical reason why doubles should be faster than floats on any hardware - although you can probably come up with a flaky speed test to 'prove' it!

Floats should in general be faster because they only require 4 bytes storage instead of 8 - which will be important when dealing with large arrays of floats, or classes with many float fields etc when memory is being 'hit' often. And it's esp. important on targets where memory bandwidth can be an issue - ie: iOS, Android etc.

For targets where FP is being emulated (and I don't know if there are any - will check), then floats should be MUCH faster.

For targets with FPU's, floats are represented internally as 80 bits values, so both 32bit AND 64bit floats require 'conversion' when being 'loaded' into the FPU so even if your floats ALWAYS resides in a register and never needs to be loaded/stored from memory, there should be no difference.

It is possible to store floats in their native '80 bit' format, but even then to the best of my knowledge these should be no faster under any circumstances.

If someone can show me a compelling *meaningful* example showing doubles are faster (and not just a hokey 'square route' routine or something, but something that does what *real* code does, ie: hits memory, calls functions etc ) then I might revisit this, but at the moment, this 'doubles are faster' meme is something I currently just consider voodoo with no theoretical basis.

[edit]
Ok, here's the challenge: If you can write an array sort routine - any algorithm - that sorts doubles faster than floats, I'll take a closer look at this 'doubles are faster than floats' theory!
[/edit]

marksibly

(Posted 2011) [#4]

Hi,

Ok, here's what I would consider a 'meaningful' example (in c++).

It involves both a reasonable amount of memory access and algorithmic, register friendly code.

Compiled with Mingw 3.4.5 cmd line: "g++ -ffast-math -O2 test.cpp -lwinmm"

For me, this consistently gives: 1343 for floats, 1453 for doubles.

Not as big a difference as I suspected, and you could certainly mount a reasonable argument that the lack of speed is worth the extra precision, but in this example at least, DOUBLES ARE SLOWER THAN FLOATS.

I'm getting a bit worked up about this because I don't want the 'doubles are faster than floats' myth to take hold the way 'bytes are faster than ints' took hold in the BlitzMax days.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <windows.h>
#include <mmsystem.h>

const int N=10000000;

int main(){

	int tm;
	std::vector<float> fv;
	std::vector<double> dv;

	srand( 1234 );
	
	tm=timeGetTime();
	for( int i=0;i<N;++i ){
		fv.push_back( float( rand() )/float( RAND_MAX ) );
	}
	std::sort( fv.begin(),fv.end() );
	tm=timeGetTime()-tm;
	printf( "%i\n",tm );
	
	srand( 1234 );
	
	tm=timeGetTime();
	for( int i=0;i<N;++i ){
		dv.push_back( float( rand() )/float( RAND_MAX ) );
	}
	std::sort( dv.begin(),dv.end() );
	tm=timeGetTime()-tm;
	printf( "%i\n",tm );

	return 0;
}

Canardian

(Posted 2011) [#5]

I get the same result, however your speed test does not really measure the speed of double vs float operatios, but it mostly takes time on allocating more bytes for the double datatype, 8 bytes (double) vs 4 bytes (float).

Double is still faster in math operations, but if you store big arrays in memory, then handling the array takes a bit longer.

However, I set N 5 times bigger, so the test takes now 29.093s (double) vs 28.921s (float). I think the 172ms difference is much less than the speed increase double brings in math loops.

So, double is still faster than float :)

marksibly

(Posted 2011) [#6]

Hi,

> Double is still faster in math operations, but if you store big arrays in memory, then handling the array takes a bit longer.

Prove it - post a meaningful example.

If I just time the sort, it's still faster, and a sort only involves FP comparisons - which is surely a math operation, so math operations are faster with floats?

The sort routine use by the std libs is publicly available - it's a bog standard quicksort and IMO a very good test of the speed of whatever's being sorted.

> I think the 172ms difference is much less than the speed increase double brings in math loops.

What math loops? And how often does the average app execute a 'math loop' vs just transferring data around? Why give 'math loops' precedence?

I think the above example is very valid BECAUSE it reflects what apps often do - pass data around in memory, ie: for function calls, storing in vars, fields etc, as well as the register friendly 'sort', which is surely a kind of 'math loop'.

And what logic are you basing all this on? What computer science theory suggests doubles should be faster than floats, given neither are the 'native' representation of floats inside the FPU and doubles consume twice the memory bandwidth?

Canardian

(Posted 2011) [#7]

I think the speed increase in math operations has something to do that 64-bit is faster to convert to 80-bit than 32-bit.

However, in both cases, array and math, the difference is very minimal in real games, since nobody loops 50 million elements. And with smaller amounts the difference is unsignificant.

What however is then more important than the little speed difference, is the accuracy, and here double wins float by far, and the accuracy is a real issue in many games, especially with shadow maps and long distance quaternion objects.

I made a speed test based on your array test, and added a math loop also. The results are like:

double : array:7.880000, math:17.942000, total:25.822000
float  : array:7.382000, math:17.972000, total:25.354000

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <windows.h>
#include <mmsystem.h>

const int N=50000000;

template<typename T>class SpeedTest
{	public:
	std::vector<T> v;
	int tmt;
	
	SpeedTest():tmt(0){}
	
	void MathTest()
	{
		for( int i=0; i<N; ++i )
			for( int j=0; j<5; ++j )
				v.at(i)=v.at(i) * T(rand()) - T(rand()) / (T(rand()) + T(rand()) );
	}

	void ArrayTest()
	{
		for( int i=0; i<N; ++i )
		{
			v.push_back( T( rand() )/T( RAND_MAX ) );
		}
		std::sort( v.begin(),v.end() );
	}
	
	void FullTest()
	{
		int tm0=0,tm1=0,tm2=0;
		
		tm0=timeGetTime();
		ArrayTest();
		tm1=timeGetTime()-tm0;
		tmt+=tm1;
		
		tm0=timeGetTime();
		MathTest();
		tm2=timeGetTime()-tm0;
		tmt+=tm2;
		
		printf("array:%f, math:%f, total:%f\n",
				tm1/1000.0,
				tm2/1000.0,
				tmt/1000.0);
	}

};

int main()
{
	SpeedTest<double> *d;
	SpeedTest<float>  *f;
	
	d=new SpeedTest<double>;
	printf("double : ");  d->FullTest();
	delete d;

	f=new SpeedTest<float>;
	printf("float  : ");  f->FullTest();
	delete f;
	
	return 0;
}

marksibly

(Posted 2011) [#8]

Hi,

> I think the speed increase in math operations has something to do that 64-bit is faster to convert to 80-bit than 32-bit.

This sounds like an 'urban legend' to me - I've never seen this documented anywhere and it doesn't sound at all right to me - everything generally takes one tick these days.

The results certainly don't support it - try swapping the order of the tests if you want floats to 'win' the math test (which I still consider effectively meaningless). Caches complicate lots of things, and this 'test' is extremely cache friendly so I don't think we're getting the full effect of the memory hit either.

So yes, you can argue that double's are better because they're more precise - but they are NOT faster. And certainly not the '2 to 5 times' faster that you were originally claiming!

And don't forget, we are talking about PC's with huge caches here - the original question concerned iOS which I suspect (although I have't tested) will show a much greater difference in speed between floats and doubles.

Canardian

(Posted 2011) [#9]

I expanded the test a bit, so that it shows seperately the time of allocation and sorting, and also added a pure math test which does not deal with the array:

double : allocate:1.685000, sort:6.272000, math:18.560000, math2:11.924000, total:38.441000
float  : allocate:1.969000, sort:6.067000, math:23.379000, math2:12.283000, total:43.698000

Although double is faster in this test, the significance of pure math loops in games is smaller than for array allocation and sorting. But it really depends on the situation, and like I said, I think the accuracy and number range is much more important than the little speed differences or memory usage.

#include <vector>
#include <algorithm>
#include <stdlib.h>
#include <windows.h>
#include <mmsystem.h>

const int N=50000000;

template<typename T>class SpeedTest
{	public:
	std::vector<T> v;
	int tmt;
	
	SpeedTest():tmt(0){}
	
	void MathTest2()
	{
		T x=T(rand());
		for( int i=0; i<N; ++i )
			for( int j=0; j<5; ++j )
				x *= T(rand()) - T(rand()) / (T(rand()) + T(rand()) );
	}
	
	void MathTest()
	{
		for( int i=0; i<N; ++i )
			for( int j=0; j<5; ++j )
				v.at(i)=v.at(i) * T(rand()) - T(rand()) / (T(rand()) + T(rand()) );
	}

	void SortTest()
	{
		std::sort( v.begin(),v.end() );
	}
	
	void AllocateTest()
	{
		for( int i=0; i<N; ++i )
		{
			v.push_back( T( rand() )/T( RAND_MAX ) );
		}
	}
	
	void FullTest()
	{
		int tm0=0,tm1=0,tm2=0,tm3=0,tm4=0;
		
		srand(1234);
		
		tm0=timeGetTime();
		AllocateTest();
		tm1=timeGetTime()-tm0;
		tmt+=tm1;
		
		tm0=timeGetTime();
		SortTest();
		tm2=timeGetTime()-tm0;
		tmt+=tm2;
		
		tm0=timeGetTime();
		MathTest();
		tm3=timeGetTime()-tm0;
		tmt+=tm3;

		tm0=timeGetTime();
		MathTest2();
		tm4=timeGetTime()-tm0;
		tmt+=tm4;
		
		printf("allocate:%f, sort:%f, math:%f, math2:%f, total:%f\n",
				tm1/1000.0,
				tm2/1000.0,
				tm3/1000.0,
				tm4/1000.0,
				tmt/1000.0);
	}

};

int main()
{
	SpeedTest<double> *d;
	d=new SpeedTest<double>;
	printf("double : ");  d->FullTest();
	delete d;

	SpeedTest<float>  *f;
	f=new SpeedTest<float>;
	printf("float  : ");  f->FullTest();
	delete f;
	
	return 0;
}

The original 2-5 times faster includes also practical issues with float vs double, because when you need big values, you need additional variables and logic to overcome the range and accuracy limits of floats. So in practice double is also faster because you need less code to handle big ranges and high accuracies.

Perturbatio

(Posted 2011) [#10]

de ja vu...

http://www.blitzmax.com/Community/posts.php?topic=77743

marksibly

(Posted 2011) [#11]

Hi,

> The original 2-5 times faster includes also practical issues with float vs double, because when you need big values, you need additional variables and logic to overcome the range and accuracy limits of floats.

Ha! Nice one! Can't argue with that 'logic'...

simonh

(Posted 2011) [#12]

To answer the original question - no, float operation shouldn't be particularly slow on iOS, although older hardware (iPhone 3G) of course may struggle a bit with lots of float operations.

Wagenheimer

(Posted 2011) [#13]

Thanks Simonh!

I don't know for sure what was wrong!

Should I avoid any math operations on render code? I got more than 30fps increase when I removed all my operations from render code. It was all basic operations, like SCREENWIDTH / 2, but it made big difference. I will try to make an demo application showing the difference.

And another thing, I have tested in HTML5 and iOS, and the slowness seems to happens only on iOS.