Odd results in performance test...

BlitzMax Forums/BlitzMax Programming/Odd results in performance test...

LT	(Posted 2015) [#1]

While testing bitwise operations (not in THIS test), I happened upon an odd result. Here, I am simply assigning two values to two other values over and over and timing the results. The second batch, on my system, is taking almost twice as long as the first batch, even though it's doing the same thing. Does anyone know why this happens (or are your results different)?

Framework brl.standardio

Type ta
	Field fb:Byte
	Field fs:Short
End Type

Local a:ta = New ta
a.fb = 48
a.fs = 25001

Local b1:Byte, s1:Short
Local b2:Byte, s2:Short
Local ms1, ms2, ms3
Local iters = 1000000000

'GCSuspend()

ms1 = MilliSecs()

For i = 1 To iters
	b1 = a.fb
	s1 = a.fs
Next

ms2 = MilliSecs()

For i = 1 To iters
	b2 = a.fb
	s2 = a.fs
Next

ms3 = MilliSecs()

'GCResume()

Print ms2 - ms1
Print ms3 - ms2

Brucey

(Posted 2015) [#2]

Interesting results here too, on OS X 32-bit, release build, averages over 10 runs each, and code tweaked to SuperStrict (NG requires at least Strict level).
Legacy BlitzMax :

693.8
677.6

BMX NG :

333.5
341.4

Of course, it's mostly a pretty meaningless test ;-p

LT	(Posted 2015) [#3]

Well, those are the results I would expect. And if I got them, I'd think it was meaningless, too.

But my results (on my old machine):

1217
2199

That seems strange, to me. Could something during initialization be messing with my results?

LT	(Posted 2015) [#4]

Added some code to run the test over and over and averaged the results - and they are still strange (1204.3 and 2266.7). So, I've ruled out the program's initialization.

I agree that this test is, by itself, not meaningful, but it's the testing procedure that I am questioning.

Brucey

(Posted 2015) [#5]

No idea why your results appear so skewed.
Do you get different results with this?

SuperStrict

Framework brl.standardio

Type ta
	Field fb:Byte
	Field fs:Short
End Type

Local a:ta = New ta
a.fb = 48
a.fs = 25001

Local b1:Byte, s1:Short
Local b2:Byte, s2:Short
Local ms1:Int, ms2:Int, ms3:Int
Local iters:Int = 1000000000

'GCSuspend()

ms1 = MilliSecs()

For Local i:Int = 1 To iters
	b1 = a.fb
	s1 = a.fs
Next

ms2 = MilliSecs()

For Local i:Int = 1 To iters
	b2 = a.fb
	s2 = a.fs
Next

ms3 = MilliSecs()

'GCResume()

Print ms2 - ms1
Print ms3 - ms2

Floyd

(Posted 2015) [#6]

No significant difference for me (referring to original test), i7 CPU and 64-bit Windows 7.

I'll hazard a guess that the issue is code alignment, which for some reason is affecting the old machine.

Try putting dummy code in various places, to move other code in memory, something like the following. Note I added a Print at the end to use the nothing value. That's in case BlitzMax takes the liberty of pruning code which can never affect results.

Framework brl.standardio

Type ta
	Field fb:Byte
	Field fs:Short
End Type

Local a:ta = New ta
a.fb = 48
a.fs = 25001

Local b1:Byte, s1:Short
Local b2:Byte, s2:Short
Local ms1, ms2, ms3
Local iters = 1000000000

'GCSuspend()

nothing:Int = 0  ' This will move following code slightly.

ms1 = MilliSecs()

For i = 1 To iters
	b1 = a.fb
	s1 = a.fs
Next

ms2 = MilliSecs()

For i = 1 To iters
	b2 = a.fb
	s2 = a.fs
Next

ms3 = MilliSecs()

'GCResume()

Print ms2 - ms1
Print ms3 - ms2

Print "The nothing value, just so it gets used, = " + nothing

LT	(Posted 2015) [#7]

Hey, thanks for your replies. I tried both versions and the results are the same. I am ... confuzzled. :/

Brucey, on a semi-related question, is there a way to get access to the GC's reference count?

AdamStrange

(Posted 2015) [#8]

why not loop the results totalling as you go and see then:

superstrict

Type ta
	Field fb:Byte
	Field fs:Short
End Type

Local a:ta = New ta
a.fb = 48
a.fs = 25001

Local b1:Byte, s1:Short
Local b2:Byte, s2:Short
Local ms1:Int
Local ms2:Int
Local ms3:int
Local iters:Int = 1000000000
Local i:Int

Local c1:Int
Local c2:int

'GCSuspend()
Local k:Int

For k = 1 To 50
	ms1 = MilliSecs()
	
	For i = 1 To iters
		b1 = a.fb
		s1 = a.fs
	Next
	
	ms2 = MilliSecs()
	
	For i = 1 To iters
		b2 = a.fb
		s2 = a.fs
	Next
	
	ms3 = MilliSecs()
	
	'GCResume()
	c1 :+ (ms2 - ms1)
	c2 :+ (ms3 - ms2)
	
	Print k+" a="+(ms2 - ms1)+" b="+(ms3 - ms2)
Next
Print 
Print "total a="+c1+" b="+c2
Print "average a="+(c1/50)+" b="+(c2/50)

This gives you a muck better idea about speed

Brucey

(Posted 2015) [#9]

on a semi-related question, is there a way to get access to the GC's reference count?

On legacy Blitzmax, it should be at (Object address - 4 bytes)

Derron

(Posted 2015) [#10]

On my linux box the release build took
1038
1036

or in later runs
1036
1036

So I assume it is "like it should". Debug builds for Vanilla and NG took too long, I "stopped execution" before results came in (>30sec).

With NG I get
693
690

which is more than the shapeoff Brucey got (maybe his "vanilla" result isnt that efficient on Macs ?).

BTW: Superstrict

SuperStrict
Framework brl.standardio

Type ta
	Field fb:Byte
	Field fs:Short
End Type

Local a:ta = New ta
a.fb = 48
a.fs = 25001

Local b1:Byte, s1:Short
Local b2:Byte, s2:Short
Local ms1:Int, ms2:Int, ms3:Int
Local iters:Int = 1000000000

'GCSuspend()

ms1 = MilliSecs()

For Local i:Int = 1 To iters
	b1 = a.fb
	s1 = a.fs
Next

ms2 = MilliSecs()

For Local i:Int = 1 To iters
	b2 = a.fb
	s2 = a.fs
Next

ms3 = MilliSecs()

'GCResume()

Print ms2 - ms1
Print ms3 - ms2

@AdamStrange
Your code in Vanilla:
1 a=1038 b=1058
2 a=1035 b=1379
3 a=1035 b=1061
4 a=1035 b=1127
5 a=1036 b=1113
6 a=1035 b=1176
7 a=1035 b=1092
8 a=1035 b=1059
9 a=1035 b=1053
10 a=1036 b=1099

Your code in NG (odd results):
1 a=1064 b=1295
2 a=741 b=854
3 a=737 b=917
4 a=733 b=1052
5 a=1102 b=1199
6 a=1008 b=820
7 a=1202 b=691
8 a=1045 b=696
9 a=1052 b=693
10 a=1175 b=708

@Brucey, why these hickups - is this some kind of GC kicking in differently to vanilla?

bye
Ron

Scaremonger

(Posted 2015) [#11]

On my laptop (HP ProBook 6550b) running Windows 7, 64bit:

Normal: a=1279, b=1271
Debug: a=18241, b=18138
MT: a=1275, b=1275
MT+debug: a=31518, b=32583

Same laptop running XUbuntu 12.04, 64bit:
Normal: a=1264, b=1259
Debug: a=18695, b=19152
MT: a=1341, b=1332
MT+Debug: a=29724, b=29454

I was a little surprised by MT+Debug being so much slower!
Si...

LT	(Posted 2015) [#12]

On legacy Blitzmax, it should be at (Object address - 4 bytes)

Tried it, but noticed a thread where you recommended against doing just that. Also, I found that strings (and ONLY strings) reported 1 less than I would expect. I had a reference to a string and the result was 0.

Thanks, everyone, for testing!

ImaginaryHuman

(Posted 2015) [#13]

Maybe something to do with caching?

LT	(Posted 2015) [#14]

Don't think so. If that was the case, I'd expect the outcome to be the same if I swapped the order.

LT	(Posted 2016) [#15]

Bringing this up again because something occurred to me and I had to test it. Perhaps the results of this test would be different based on my IDE and sure enough!

/shaking head

Blide result: 1208 / 2009
MaxIDE result: 1208 / 1210

Whaaat!?

Brucey

(Posted 2016) [#16]

Having fun? :-)

On my new Skylake i5 machine (OS X), I get :

Legacy, release, 32-bit:

588
598

Current NG, release, 64-bit / 32-bit (they come out about the same each run) :

287
287

I expect the compiler (eg. GCC) to automatically sort out the field alignments for the generated C struct in NG.
No idea if this is also the case with the legacy compiler.
SizeOf(a) reports 4 for both legacy and 32-bit NG. 8 bytes for 64-bit, as you'd expect.

<edit>
Based on the offsets, it looks like legacy bcc is also aligning the short :

	movb	$48,8(%ebx)
	movw	$25001,10(%ebx)

LT	(Posted 2016) [#17]

Oh, I wasn't testing compilers. I was testing IDEs this time. It seems that Blide is doing something a bit differently.

Brucey

(Posted 2016) [#18]

IDEs? Your code is compiled into machine code. The only interaction your IDE has with your running program (in release mode) is sucking in the piped output from it.

And as per AdamStrange's example, your figures will be more normalised if you run them several times in a loop.

LT	(Posted 2016) [#19]

Did that ages ago, Brucey, and the results were the same. Yes, I know that code is compiled into machine code, but that doesn't explain why I'd be getting different results depending on the IDE...that's the mystery.

EDIT: I know you think it's just user error, but if you have Blide - try it...

I replaced the last Print with Notify (adding Import brl.system at the top, of course) and created .exes using both IDEs. The exe that came from Blide gives different results than the one that came from MaxIDE. It's as if Blide was doing something extra... YET the exe sizes are the same. It's just weird. At first, I thought maybe it was interference from my virus checker or something, but now I don't know.

TomToad

(Posted 2016) [#20]

I wonder if Blide is sending the same flags to the bcc compiler as MaxIDE does. It might be why the difference.

LT	(Posted 2016) [#21]

if Blide is sending the same flags to the bcc compiler as MaxIDE

Good call...that could be it!

Brucey

(Posted 2016) [#22]

You could always diff the generated .s files (in the .bmx folder) created by each build and see if there's anything different there. With the same version of bcc you'd expect to see the same generated code.

LT	(Posted 2016) [#23]

Great suggestion, and...they are slightly different. The Blide .s output has two extra lines...

extrn _bbEnd and call _bbEnd

Interesting... :-)

Brucey

(Posted 2016) [#24]

And otherwise they are identical?

Are you compiling the *same* source file with the *same* BlitzMax instance?

LT	(Posted 2016) [#25]

Yes, and yes. Your question leads me to believe you don't think _bbEnd should do anything..?

TomToad

(Posted 2016) [#26]

Remember that the options aren't just for the benefit of bcc. Some options will be passed off to the assembler and linker as well. You may have identical .s files, but completely different .a files if you are using different optimization settings in FASM.

Brucey

(Posted 2016) [#27]

There are no settings passed to FASM on Windows.