Blitzmax faster than C?

BlitzMax Forums/BlitzMax Programming/Blitzmax faster than C?

IceVAN(Posted 2012) [#1]
EDIT:
Uppsss!!
I was compiling in debug mode in C.
But in release mode, C takes 10 seconds.

------

Is this possible?

The following code in C runs slower than equivalent in Blitzmax.
In my computer, the C version takes 35 seconds and Blitzmax version takes 8 seconds.

The code generates a table with 100000 numbers. And then sorts the table with the method of the bubble (unoptimized on purpose).

Can anyone test it?

Thanks!

C code:


BlitzMax code:


Last edited 2012


Floyd(Posted 2012) [#2]
I get 6 seconds for MinGW and 5 for BlitzMax, essentially the same.

Maybe you are building debug code for C.


Yasha(Posted 2012) [#3]
Repeatedly trying it out, BlitzMax, GCC and Clang all make that program take about 20 seconds on my machine.

So yeah, remember to turn on optimisations if you want your C to actually be as fast as expected.

Last edited 2012


col(Posted 2012) [#4]
Takes the same time here - 8sec. I compiled them both with Bmax. Sounds like compiler optimisations.

Test.bmx




test.c


Last edited 2012


xcessive(Posted 2012) [#5]
Compiled C under GCC with full optimisations: 7 Seconds.
Blitzmax: 6 seconds.

I'm confused, timer granularity?

Last edited 2012


AntonyWells(Posted 2012) [#6]
Why? They both produce native machine code, so the language is almost immaterial. I'm sure C can out-perform blitz using the right flags, but the end result is negligible.


Noobody(Posted 2012) [#7]
A few things:

- You're measuring performance in seconds. You're using the standard time library in C. This is inaccurate. BMax already offers a milliseconds command and, if you're on windows, you can use QueryPerformanceCounter to get a very accurate clock in C.

- If you want to measure performance, always enable optimization when compiling C code (-O3 in gcc), since BMax already optimizes when in release mode.

After adding a more accurate timing code and enabling full optimization on the C compiler, both C and BMax end up taking 9.45 seconds to execute the program.

I then tried the quicksort algorithm as an additional test. After increasing num to 1'000'000 (since quicksort is so fast :) ), BMax ended up taking 200 seconds to complete, while the C program took 175 seconds.
I'm sure the difference would increase even more if one were to take longer, more involved code and especially code involving lots of data and lots of math (since C/++ is the number crunching language).

So for application programming and most fields of games programming, BMax actually performs pretty decent. If you're working with a lot of math and performance is important (or if you're going to use multithreading, heh), writing an external piece of C/++ code to use with your BMax code is not a bad idea.

Here are my modified codes for BMax and C (with the quicksort code).





Yasha(Posted 2012) [#8]
They both produce native machine code, so the language is almost immaterial.


This is like saying "two students both speak English, so they will write the same essay". There isn't but-one way to express a given computation in machine code - B3D produces it too, but it would likely be much slower because it doesn't do any optimisation at all.

On the other hand there's going to be a limit to how well a compiler can optimise a simple program like this (short of recognising a sorting operation at compile time, both BlitzMax and GCC have probably made it as fast as it will get). If you really want to test optimisation, you need to give them a slightly more complex program that has some greater potential to be heavily rewritten. The C++ vs. BlitzMax geometry functions in miniB3D might be a decent example of something like that.

(Technically yes, the language is immaterial and the compiler is what matters, but there's only one compiler for BlitzMax.)


AntonyWells(Posted 2012) [#9]
Lol, not quite, my point is they both have the same theoretical top speed, considering they both support threads etc and of course blitz can import C++ libs for things like OpenCL for further speed up.

All in all, I'd say C++ is probably faster-ish, but the benefits of the blitz language make up for it I'd say.


ProfJake(Posted 2012) [#10]
my point is they both have the same theoretical top speed


I am sorry to disappoint you, but no. And that is totally okay.
See, the differences begin at a theoretical level where C is designed to be a very fast, customizable language with a more machine level of thinking and BlitzMax is much more abstract and user-oriented.

There may be similar results in speed in some scenarios, like the above.
But as the application grows, noticeable differences are bound to happen.
It's a simple result that starts with the handling of memory (GC, new String objects returned form default functions, ..) and ends with the optimization of the standard library in terms of speed (BRL modules).

Just try declaring inline functions, constant pointers or compile-time objects in BlitzMax and you will see.


AntonyWells(Posted 2012) [#11]
Well at any rate, C produces ASM that is compiled into X86, and so does C.

I have wrote C#/Basic like compilers to machine code, and the handling of classes/types is pretty uniform.

The biggest differences are probably the final code optimization etc, but Mark is fairly old skool, so I'm sure his final code is pretty good.


I've studied BlitzMax's asm output, and it employs fairly clever if not stunning use of the registers. The only problem in my eyes, is the lack of forward prediction, but with only a minimal amount of registers it's probably not that vital.

In my own tests, I've found BlitzMax/C++ to be fairly similar. C# is the example of a language that uses an alternative output path, and it is much slower, as it uses a Virtual Machine.


AntonyWells(Posted 2012) [#12]
oh but yes inline and other things will probably blow through the speed differences on a long term scale, but also have to consider blitz makes a lot of things easier to use, so it's a fairly good trade-off either way.


Yasha(Posted 2012) [#13]
and it is much slower, as it uses a Virtual Machine


This is... incredibly wrong. The only time C# uses a virtual machine is in the unlikely case of wanting to debug your CIL output from the compiler.

Anyway, there really is a lot more to optimisation than register allocation and inlining (hell, both of these are becoming positively outdated now as we get closer to hardware-jitting). Go and look up graph rewriting to find out how the pros do it. The point you seem to be missing is that there really is no necessity at all for output to be similar across different languages and compilers: many people go for the "simple" approach, with a stack and structs on a heap and so on, but there are lots of ways to do it.

The mere fact that two languages both end up represented as x86 is completely meaningless. No, really. Irrelevant to the highest degree.


AntonyWells(Posted 2012) [#14]
Ok, anyway as for C# it matters not, the final code which is translated into x86 is quite slow.

I hate it when people argue points with me I know are right. I'm not some newbie judy :)

Anyway *bail out.jpg* see ya in the other threads kid.


ziggy(Posted 2012) [#15]
and it is much slower, as it uses a Virtual Machine

The Microsoft C# compiler can produce much more optimied code than the C++ compiler, becouse it can make compilation in a per-computer basis (among other things) in some areas. It is not using a VM, it can be jitted or just converted to native pure machine code at install time of the application. That's called install time compilation and it produces programs that are fast as hell. That's why people re using NGen, and the likes.
I would also recommend you this read: http://www.fairyengine.com/articles/cppvscsharp.htm It's very interesting.

Ok, anyway as for C# it matters not, the final code which is translated into x86 is quite slow.
Prove? Benchmark?

hate it when people argue points with me I know are right. I'm not some newbie judy
but, In my honest opinion, your posts make you look like you have a not very deep knowledge about the CLR, no offense intended, just telling you my impressions on your posts.

EDIT: Those are some interesting benchmarks, you'll see how most of the times the difference is negligible, sometimes C# on .net 4 on 64 bits is faster than the same code in C++ with the Visual C compiler, some of the times it's otherwise. Most of the times they're very close. http://www.codeproject.com/Articles/212856/Head-to-head-benchmark-Csharp-vs-NET

Last edited 2012


AntonyWells(Posted 2012) [#16]
The code is most definitely a lot slower, because however you roll it, CLR adds a tremendous amount of overhead.

I've coded 3D engines in C# and numerous other things using OpenCL etc, and I can safely assert that yes it's a lot slower!

On the other hand, it has a huge wealth of libs/sdks which make up for it mostly.

But I'm back using Blitz specifically because C# was a bit too slow.

(to be polite)

the problem is when you get down to the high cycle processes like per pixel image processing etc. OpenCL improve that sit though.

As for my knowledge of CLR, granted, but I have been a C# coder for many years and it's not as fast IMO.

Mono is a bit faster, which I've used alot with unity3D. All in all, I'd still say C++/Blitzmax is vital for speed critical code. but that's where dlls come in.


AntonyWells(Posted 2012) [#17]
BTW I use your blide alot, I would love to know how you compile dlls using BlidePlus as I lost my license. *wink wink, binaryarch@... if you have your records handy and don't mind a re-issue :) lol*



that said I'm installing mac in a minute. any plans to do mac ides for max?


ziggy(Posted 2012) [#18]
@AntonyWells: There is no license associated to the gmail account in your previous post, please contact me with any license datails you may have in order for me to find your license if you still want to download again your BLIde Plus.


Mahan(Posted 2012) [#19]
Did some testing myself (on a slow computer):

1. C# (.NET 4.0/Release/x86) - 95 sec
2. Java (1.7.0_02) - 34 sec
3. BlitzMax (Release, code from first post) - 32 sec

I was most amazed by the Java version. Looks like Sun/Oracle has done some nice work on the JIT since i worked with it professionally.


c# version:


Java version:


edit: spelling

Last edited 2012


Noobody(Posted 2012) [#20]
Interesting, Mahan!

I was aware that C# would not be as fast as other languages, but to think it would be almost three times slower than Java?


Mahan(Posted 2012) [#21]

I was aware that C# would not be as fast as other languages, but to think it would be almost three times slower than Java?


And to think Java is really up there with C/BMX! Was a shocker for me. I tested Java version last and my face went totally o.O

I started looking for bugs :)


ziggy(Posted 2012) [#22]
This should be tested with NGen to have realiable results.


Mahan(Posted 2012) [#23]
ziggy: do it! (if you want to)
You've got the know-how which i lack. (possible optimizations etc.)


AntonyWells(Posted 2012) [#24]
Not sure Ziggy, it's not a problem anyway as I'm not using windows for now.

Anyway off for brunch with Scott Baluka.


xcessive(Posted 2012) [#25]
Java: 19 Seconds
BMax: 7 Seconds.
C: 7 seconds.
C#: 33 seconds.

Instead of sorting numbers, try sorting objects. I think Java's performance will surprise.


Mahan(Posted 2012) [#26]
@xcessive : interesting! What CPU did you run this on?

I tested on a AMD Phenom n930 (laptop). This CPU does not have L3 cache, so maybe my results where skewed?


xcessive(Posted 2012) [#27]
Java takes advantage of cache a LOT. It has an extremely good locality optimizer (once JIT compiled), so I don't think so. Or at least, it should be the other way round. I think its JIT compilation time thats hurting Java here. Since the test case is so small, its compilation hit makes a big difference. I have a feeling Java will scale well with bigger problems/test cases--that is after all what its designed for.

Now that I think about it, its obvious. Its Java's array bounds checking, its slowing it down a lot, since I am using JRE 1.6! Java had notoriously slow bounds checking up till 1.7.

To be honest I am surprised too regardless, I usually find that Java performs very close to slightly optimised C++, especially when a lot of objects and methods calls are involved. Java has super fast memory allocation and method calling. In fact Java is known to beat C sometimes in recursive function calling also.

I am using an Intel i7 950.

Last edited 2012


Evil Roy Ferguson(Posted 2012) [#28]
5 seconds for the C version (Visual C++ 2010; I'm too lazy to break out GCC at the moment...)
6 seconds for the BlitzMax version
11 seconds for the C# version (Release, .NET 4.0 / x86)
15 seconds for the Java version. (Release, JDK 1.7)

I'm not sure what compiler flags you're using for C#, but something is probably misconfigured if you're getting times like that. Are you sure that you're running in release mode? Although I did not use it, NGen would definitely make for a better comparison, as ziggy said.

Windows 7, Core i5 @ 3.2GHz.

Last edited 2012


Mahan(Posted 2012) [#29]
@Evil Roy Ferguson : Yes, seems like i fsck:ed up something in my C# params in some way.

When fiddling I found another quite peculiar thing (still on my AMD X4 930n processor):

On my system, 4 separate worker-threads each doing the speed test from above all(!) finish in ~20 seconds, in parallel.

If I simply run 1 test in the main-thread it takes ~30 seconds.

So 4 threads run 4 tests in 2/3 of the time that the main thread does it once.

Weird huh?

Testcode 4 threads:



Only main thread:


(Threaded build in both examples to get an as similar test-case as possible)


Orca(Posted 2012) [#30]
Sorry for the necro, but these things always grab my interest.

Anyways you can't just do direct ports between languages/platforms, and assume all is what it seems.

The c# code is generating bounds checks for those array accesses. A common .NET idiom is to loop using the array's Length property. The JIT is aware of this, and can remove many bounds checks.

( FWIW, I wouldn't use DateTime for perf testing either, but I think its a non issue in this case... )

Here's more idiomatic c# code...



I compiled with:

.NET 3.5 sp2, csc.exe: /o+ /debug- BubbleCSopt.cs
BlitzMax 1.44: Release build

I did 5 runs each, and the C# and BlitzMax versions tied consistantly, at 23 seconds each. ( Ancient comp :/ )

Last edited 2012