Dynamic allocations in C++ versus BlitzMax

Community Forums/General Help/Dynamic allocations in C++ versus BlitzMax

JoshK

(Posted 2010) [#1]

In BlitzMax, my math code eventually evolved to avoid dynamic allocations, because it made a huge difference in speed:

Method Add( v:TVec3, result:TVec3 )
	result.x=Self.x+v.x
	result.y=Self.y+v.y
	result.z=Self.z+v.z
EndMethod

The slow way to do it is this:

Method Add:TVec3( v:TVec3 )
	Local result:TVec3=New TVec3
	result.x=Self.x+v.x
	result.y=Self.y+v.y
	result.z=Self.z+v.z
	Return result
EndMethod

With C++ is this necessary?:

void Add( const Vec3 v, Vec3 result ) {
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
}

Or is this just as fast?:

Vec3& Add( const Vec3 v ) {
	Vec3 result();
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
	return result;
}

Azathoth

(Posted 2010) [#2]

You shouldn't be returning a reference to a local variable.
Unlike BlitzMax, in C++ result is being created on the stack and when the function returns the variable is out of scope and no longer valid.
You can use the first C++ function, return the local variable by value in which Vec3 is copied, or create it on the heap with 'new' and return the pointer (however you must free it with delete).

Edit: Another thing Add is making a copy of v and result since you're passing them to the function by value.

Should be like this:

void Add( const Vec3& v, Vec3& result ) {
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
}

Return by value would be like this

Vec3 Add( const Vec3& v) {
	Vec3 result;
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
	return result;
}

Or create result on the heap

Vec3* Add( const Vec3& v) {
	Vec3 *result=new Vec3;
	result->x = this->x + v.x;
	result->y = this->y + v.y;
	result->z = this->z + v.z;
	return result;
}

JoshK

(Posted 2010) [#3]

My point is, is this:

void Add( const Vec3& v, Vec3& result ) {
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
}

faster than this, to any significant degree?:

Vec3 Add( const Vec3& v) {
	Vec3 result;
	result.x = this->x + v.x;
	result.y = this->y + v.y;
	result.z = this->z + v.z;
	return result;
}

*	(Posted 2010) [#4]

technically yes as the variable is already declared somewhere this is then passed to the function, in the second instance the compiler has to process the Vec3 class create a new entry for it on the stack and then use the variables. As has been pointed out before the variable on some compilers can be declared invalid and out of scope.

One way around this is with pointers as they can point to variables already allocated etc.

Take into account you will only loose a millisecond with tons of calls but on an engine that needs everything to be as fast as possible it does make a difference.

Azathoth

(Posted 2010) [#5]

Yes its faster. The first one there is no constructors being called, in the second a local object is being created and then copied.

JoshK

(Posted 2010) [#6]

Thanks.

GW	(Posted 2010) [#7]

Actually, you should be doing these functions with SSE.
dump the asm from your C compiler for those routines to make sure they're treated that way.

Otus	(Posted 2010) [#8]

I disagree.

I don't think the two will have any real speed difference, since an "allocation" on the stack doesn't really take much time at all. If the constructor does something complex it's different, of course, but structs can be created on the stack with very low overhead. (Basically it just increments the stack pointer.)

Depending on many different things like calling code and compiler optimisations, one or the other may be faster. They may even become the very same code after inlining. If you really need to know their speed, you should run a benchmark for yourself.

My answer: No, it is not faster to any significant degree.

Azathoth

(Posted 2010) [#9]

If the constructor does something complex it's different, of course, but structs can be created on the stack with very low overhead. (Basically it just increments the stack pointer.)

The second one will still call the copy constructor. Not copying an object is always going to be faster than copying.

Otus	(Posted 2010) [#10]

Sorry, my C++ is rusty, but would it really call a copy constructor? The object would be returned on the stack in any case, so there is no reason to copy. Wikipedia suggests maybe not.

Azathoth

(Posted 2010) [#11]

~~Returning an object by value usually will invoke the object's copy constructor.~~
That Wikipedia article says it may even call it twice. Its up to the compiler and the compile settings, it can optimize it out all together. But you shouldn't really base your code on the assumption what optimization the compiler may or may not do.

Edit: I mean if I don't want to call the copy constructor I should code it so it won't, not assume the compiler will optimize it out.

Otus	(Posted 2010) [#12]

But you shouldn't really base your code on the assumption what optimization the compiler may or may not do.

You shouldn't base your code on assumed performance characteristics either, but on solid design and (where performance is needed) on profiling. :)