Tweaking CC options with BMK (NG)

BlitzMax Forums/Brucey's Modules/Tweaking CC options with BMK (NG)

Brucey(Posted 2009) [#1]
I was interested to see how much difference (if any) there would be if I were to play around with the compiler optimizations that BMK uses.

I thought the "sieve" test might be a good place to start, as mentioned here.

I tweaked it slightly, to run more efficiently, moving the local variable declarations out of the loop :


On my 2ghz Mac Mini, I get these results, using the default optimization settings :
50000 iterations took 3116 m/secs.
50000 iterations took 3112 m/secs.
50000 iterations took 3107 m/secs.


So, looking at the gcc docs, and googling a little, I decided to add these settings to my custom.bmk configuration, and rebuild BRL and Pub modules :
addccopt optimization -O3
addmacx86ccopt arch -march=nocona
addmacx86ccopt math -msse3


I wasn't actually expecting anything to change much, since the code is pretty much pure BlitzMax code, but this is what I got :
50000 iterations took 2915 m/secs.
50000 iterations took 2909 m/secs.
50000 iterations took 2902 m/secs.

which is about 7% faster, by my reckoning.

In the scheme of things, that feels like quite a large number.

I wonder what bigger effect it would have on more C/C++ oriented code...

Interesting :-)


Brucey(Posted 2009) [#2]
And for the curious, the default flag is -Os, which optimizes for SIZE rather than SPEED... which appears correct :
-rwxr-xr-x  1 brucey  brucey  151188 28 May 21:34 sieve_test
-rwxr-xr-x  1 brucey  brucey  143236 28 May 21:31 sieve_test_default

where sieve_test_default was built with the default options.

Hmmm.. now, would I rather save 8kb or be 7% faster? ...


DavidDC(Posted 2009) [#3]
Very interesting. Certainly worth testing further.


Mark Tiffany(Posted 2009) [#4]
If there is that much of a difference, this should be an official option in bmk, at least to force compilation for speed or size.


xlsior(Posted 2009) [#5]
Yeah, having it as a selectableoption in the IDE would be perfect -- My current program could noticably benefit from a 7% speed increase...


Armitage 1982(Posted 2009) [#6]
I coded my tween engine in C.
I'm still using this feature with parsimony.
Since Box2d and CEgui are C/C++ too having thing running faster is always a good thing !
Do you know if this could break something anywhere ?
I would like to try this on my game :p
Is this a good idea ?


Brucey(Posted 2009) [#7]
Do you know if this could break something anywhere ?

There's always the possibility that a higher optimization will affect some obscure code - especially with math functions, since some of the settings may store temporary values during calculations differently (the usual is 80 bits).

I did try out some of the box2d samples and I didn't see anything break. I also wanted to see if I could get many more things on the screen at once, but I think the debugdraw/graphics became the bottleneck.

Obviously you'd want to try different scenarios and see how things go.

And remember, if you decide to use -march=core2 on your core2 box, the chances are the code won't work on processors less than that.


Brucey(Posted 2009) [#8]
If you want to play with some settings, grab yourself a copy of win32 BMK from HERE, and drop the files into your BlitzMax/bin folder (remembering to BACKUP first).

Then create a file in BlitzMax/bin called custom.bmk, and in there put your compiler flags. eg. :
addccopt optimization -O3

The format is
addccopt <name> <value>

If you want a value to contain spaces, wrap it in double-quotes (")

The following option names will override the default settings appropriately
* optimization - Optimize level. The default is -Os (optimize for size)
* arch - The processor architecture. The default -march=pentium
* math - The floating point unit.

See gcc manual for more options. (google for "man gcc", is useful).

You can add other flags too. Just use a unique option name.


Armitage 1982(Posted 2009) [#9]
Hi Brucey

Using "win32 BMK" and "addccopt optimization -O3" on a fresh BlitzMax RC5 worked for me.
I did these tests 4 times each but you know since it's in-game result this is maybe not very accurate.

With 197 box2d physic objects and 132 particles in 1024x768

OLD BMK : 117 FPS (DebugMode)
NEW BMK : 118 FPS (DebugMode)

OLD BMK : 197 FPS (ReleaseMode)
NEW BMK : 205 FPS (ReleaseMode)

Reaching 205 FPS with a modest PC is always good to take.
Maybe not 7% of speed improvement for me but speed improvement there is !

Didn't notice anything abnormal so : great release !


xlsior(Posted 2009) [#10]
And for the curious, the default flag is -Os, which optimizes for SIZE rather than SPEED... which appears correct :


Does this added feature work under Windows as well?

when trying -O0, -O1, -O2, -O3 and -Os all resulting code is pretty much identical in the final exe (Just three bytes vary, which I'd assume is some internal timestamp somewhere)

Even when putting random gibberish in the custom.bmk file doesn't give any errors and results in the same .exe


Brucey(Posted 2009) [#11]
I assume you rebuilt all the modules after using a new flag?

Given that BRL.Blitz's C++ code will only be re-compiled when you rebuild the module, you'll probably find that not much happens if you are only rebuilding an app after changing these settings - given that BlitzMax uses bcc to generate binaries from your .bmx files, and then links in the pre-compiled modules.

Of course, if you have rebuild and are not seeing any changes, then something is obviously wrong.

I guess I should make it clearer at the top somewhere, that you'll need to rebuild... :-)
(you may find it more convenient to rebuild on the command-line, if you have a lot of modules : bmk makemods -a brl )


xlsior(Posted 2009) [#12]
Ah, no -- I didn't rebuild all modules first, but that makes total sense in hindsight.

Thanks for the pointer, I'll give it a shot tonight...

(I'll try duplicating my blitzmax folder, recomile one for size and the other for speed, and see if there's any noticable difference)


xlsior(Posted 2009) [#13]
Ok -- recap: I cloned my blitzmax install folder, added the custom.bmk with the -O3 parameter, and recompiled the modules.

After that, I ran a fairly intensive program that I'm working on, which does a whole bunch of string operations.

I ran it 10 times with each compiler version, and averaged out the results.

The original Blitzmax took 13.936 seconds average to complete.
With the -O3 flag, the exact same program completes in 11.209 seconds -- a ~20% speed increase!! That's actually a fairly significant number. (My program used just native Blitzmax, no added cpp)

The compiled file sizes:
Original: 680,448 bytes
Optimized for speed: 733,696 bytes

So indeed larger, but not unreasonably so.


xlsior(Posted 2009) [#14]
After UPX -9 both executables, the original one ended up at 274,432 bytes, and the speed-optimized version 300,544.


Brucey(Posted 2009) [#15]
If you know what the minimum architecture you'll be running on is (like P4, for example), you can also adjust the "arch" compiler option. But you may not see any significant differences - the idea here is that the compiler will generate code which utilizes improvements in later processor technologies, like, for example, sse2, sse3 etc.

Given that the GC is C-based, it's not entirely surprising that some of BlitzMax's core functionality can be improved with a bit of optimization :-)


xlsior(Posted 2009) [#16]
Yeah, but at that point you're also trading in compatibility, which starts getting a bit iffy... I mean, it's great for something like compiling a Linux kernel to optimize against the machine it will run against, but I have a feeling that it'll be asking for trouble when using it for a random shareware program or something...