Regarding performance

Blitz3D Forums/Blitz3D Programming/Regarding performance

Rroff

(Posted 2014) [#1]

Having a background in other languages where its a required form I've always declared my variables in this kind of style in B3D:

function test()
	local a
	a = 24234324
	return a
end function

(which for me makes it easier to keep track of whats going on and I find more readable)

however recently I found out that due to the underlying C++ and the way it works this is resulting in upto 15% more time spent in a function compared to declaring and assigning a value to a variable on the same line i.e.

function test()
	local a = 24234324
	return a
end function

If your also using a similiar style its worth checking functions that are used a lot as theres potentially a fair bit of performance difference.

RemiD

(Posted 2014) [#2]

It makes sense that more instructions will take more time.
However the 15% more time has been calculated with how many iterations ?
For a few iterations it may be negligible...

Rroff

(Posted 2014) [#3]

Depends on the function - some of mine that were being called many many times per frame were upto 15% slower, others its not measurable.

Some of the languages I've worked with in the past have compiler optimisations that mean it doesn't matter which style you use as the compiled code is the same but typically with C++ and it seems by extension B3D its left alone.

RGR	(Posted 2014) [#4]

This is not true!
Both Functions are translated into the exact same assembler code:

tmp.bb

Function test()
	Local a
	a = 24234324
	Return a
End Function

test()

tmp.asm

BlitzCC V11.6
(C)opyright 2000-2003 Blitz Research Ltd
Compiling "tmp.bb"
Parsing...
Generating...
Translating...

	.align	16
__MAIN
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	mov	eax,__DATA
	mov	[esp],eax
	call	__bbRestore
	sub	esp,4
	mov	eax,__LIBS
	mov	[esp],eax
	call	__bbLoadLibs
	call	_2_begin
	jmp	_2_leave
_2_begin
	call	_ftest
	ret
_2_leave
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	16
_ftest
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	mov	[ebp-4],0
	mov	[ebp-4],24234324
	mov	eax,[ebp-4]
	jmp	_3_leave
	mov	eax,0
	jmp	_3_leave
_3_leave
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	4
__LIBS
	.db	"",0
	.align	4
__DATA
	.dd	0

Assembling...

tmp2.bb

Function test()
	Local a = 24234324
	Return a
End Function

test()

tmp2.asm

BlitzCC V11.6
(C)opyright 2000-2003 Blitz Research Ltd
Compiling "tmp2.bb"
Parsing...
Generating...
Translating...

	.align	16
__MAIN
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	mov	eax,__DATA
	mov	[esp],eax
	call	__bbRestore
	sub	esp,4
	mov	eax,__LIBS
	mov	[esp],eax
	call	__bbLoadLibs
	call	_2_begin
	jmp	_2_leave
_2_begin
	call	_ftest
	ret
_2_leave
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	16
_ftest
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	mov	[ebp-4],0
	mov	[ebp-4],24234324
	mov	eax,[ebp-4]
	jmp	_3_leave
	mov	eax,0
	jmp	_3_leave
_3_leave
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	4
__LIBS
	.db	"",0
	.align	4
__DATA
	.dd	0

Assembling...

Please compare - I did not find any difference!
And if you don't find any difference either there cannot be the slightest performance difference.

Rroff

(Posted 2014) [#5]

Ooops good thing you posted that - was compiling with debugging enabled - which does produce upto 15% performance difference between the 2 different ways of doing it - disabling debugging and they both produce the exact same thing.

RGR	(Posted 2014) [#6]

tmp.asm (debug mode)

BlitzCC V11.6
(C)opyright 2000-2003 Blitz Research Ltd
Compiling "tmp.bb"
Parsing...
Generating...
Translating...

	.align	16
__MAIN
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	sub	esp,4
	mov	eax,__DATA
	mov	[esp],eax
	call	__bbRestore
	sub	esp,4
	mov	eax,__LIBS
	mov	[esp],eax
	call	__bbLoadLibs
	call	_2_begin
	jmp	_2_leave
_2_begin
	sub	esp,12
	lea	eax,[ebp]
	mov	[esp],eax
	mov	[esp+4],3014400
	mov	[esp+8],_4
	call	__bbDebugEnter
	sub	esp,8
	mov	[esp],393216
	mov	[esp+4],_1
	call	__bbDebugStmt
	call	_ftest
	ret
_2_leave
	mov	[ebp-4],eax
	mov	eax,ebx
	call	__bbDebugLeave
	mov	ebx,eax
	mov	eax,[ebp-4]
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	16
_ftest
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,8
	mov	[ebp-4],0
	sub	esp,12
	lea	eax,[ebp]
	mov	[esp],eax
	mov	[esp+4],2383992
	mov	[esp+8],_5
	call	__bbDebugEnter
	sub	esp,8
	mov	[esp],65537
	mov	[esp+4],_1
	call	__bbDebugStmt
	sub	esp,8
	mov	[esp],131073
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	[ebp-4],24234324
	sub	esp,8
	mov	[esp],196609
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	eax,[ebp-4]
	jmp	_3_leave
	sub	esp,8
	mov	[esp],262144
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	eax,0
	jmp	_3_leave
_3_leave
	mov	[ebp-8],eax
	mov	eax,ebx
	call	__bbDebugLeave
	mov	ebx,eax
	mov	eax,[ebp-8]
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
_1	.db	"tmp.bb",0
_4	.db	"<main program>",0
_5	.db	"test",0
	.align	4
__LIBS
	.db	"",0
	.align	4
__DATA
	.dd	0

Assembling...

tmp2.asm (debug mode)

BlitzCC V11.6
(C)opyright 2000-2003 Blitz Research Ltd
Compiling "tmp2.bb"
Parsing...
Generating...
Translating...

	.align	16
__MAIN
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,4
	sub	esp,4
	mov	eax,__DATA
	mov	[esp],eax
	call	__bbRestore
	sub	esp,4
	mov	eax,__LIBS
	mov	[esp],eax
	call	__bbLoadLibs
	call	_2_begin
	jmp	_2_leave
_2_begin
	sub	esp,12
	lea	eax,[ebp]
	mov	[esp],eax
	mov	[esp+4],6293016
	mov	[esp+8],_4
	call	__bbDebugEnter
	sub	esp,8
	mov	[esp],327680
	mov	[esp+4],_1
	call	__bbDebugStmt
	call	_ftest
	ret
_2_leave
	mov	[ebp-4],eax
	mov	eax,ebx
	call	__bbDebugLeave
	mov	ebx,eax
	mov	eax,[ebp-4]
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
	.align	16
_ftest
	push	ebx
	push	esi
	push	edi
	push	ebp
	mov	ebp,esp
	sub	esp,8
	mov	[ebp-4],0
	sub	esp,12
	lea	eax,[ebp]
	mov	[esp],eax
	mov	[esp+4],6293136
	mov	[esp+8],_5
	call	__bbDebugEnter
	sub	esp,8
	mov	[esp],65537
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	[ebp-4],24234324
	sub	esp,8
	mov	[esp],131073
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	eax,[ebp-4]
	jmp	_3_leave
	sub	esp,8
	mov	[esp],196608
	mov	[esp+4],_1
	call	__bbDebugStmt
	mov	eax,0
	jmp	_3_leave
_3_leave
	mov	[ebp-8],eax
	mov	eax,ebx
	call	__bbDebugLeave
	mov	ebx,eax
	mov	eax,[ebp-8]
	mov	esp,ebp
	pop	ebp
	pop	edi
	pop	esi
	pop	ebx
	ret	word 0
_1	.db	"tmp2.bb",0
_4	.db	"<main program>",0
_5	.db	"test",0
	.align	4
__LIBS
	.db	"",0
	.align	4
__DATA
	.dd	0

Assembling...

It is interesting to see how Mark's compiler handles basic code and how code gets optimised.

You can see above why it is different in debug mode. The Debugger must *remember* where basic code is wrong. The additional line of basic code therefore produces extra code.

Yasha

(Posted 2014) [#7]

Optimised? I think the answer to that is "it doesn't"!

My experience so far has largely been that you shouldn't worry about this too much because there are so many factors that affect performance you just can't easily control with Blitz3D. e.g. you can reverse the order of two lines of code with no data dependencies on each other, and see a 20% speedup because the instruction alignments are suddenly better (don't ask for an example, long since lost it).

I should also point out that if you really care about assembly, you can use the TCC library (old wrapper) to include "inline" assembly (or C) routines in your application. Doesn't do SSE/AVX though.

RGR	(Posted 2014) [#8]

Don't overestimate this little sentense I added.
We are talking here about a time about 13 - 14 years ago.
I just wanted to point out, that Mark was very good in what he did in those days. And that this was the reason why we had such a good tool to make games.

Rroff

(Posted 2014) [#9]

@Yasha I found a situation like that years ago - can't remember the exact details now but having certain commands in a certain order and alternating the use of /2 and/or *0.5 made a massive difference to speed and not in the order you'd expect afaik just due to instruction alignment.

Matty

(Posted 2014) [#10]

Do these sorts of issues really cause a problem wont the major source of slowdowns be somewhere else ... ususlly terribly inefficient algorithms with large amounts of data or rendering.

GW	(Posted 2014) [#11]

How did you generate the assembly output from the blitz compiler?