little Benchmark

BlitzMax Forums/BlitzMax Programming/little Benchmark

Barbapapa(Posted 2007) [#1]
Hi,

maybe someone can help me out here. I did a small research testing the speed of different programming languages. It's only a very small math test, but it's results surprised me a little.
This is the Blitzmax version, coded very quickly and dirty
Global Anzahl, Iterationenl, EndTime, StartTime,i : Long
Global Sekunden, Minuten, Stunden, DauerMsec,x,y : Double

Anzahl  =  Long ( Input("Anzahl Clients: "))
Iterationen = ((Anzahl-1)*Anzahl/2)

EndTime = MilliSecs()

For i = 0 To Iterationen
	y = Sin(i)
	x = Cos(y)
	x = x*y
Next
StartTime = MilliSecs()
Print "Anzahl: " + String(Anzahl)
Print "Iterationen: " + String(Iterationen)

DauerMsec = StartTime - EndTime

Print "Millisekunden: " + String.FromDouble(DauerMsec)
Print "Sekunden: " + String.FromDouble(DauerMsec/1000)


When I enter 10000 then on my older pc it takes 16 seconds.

Now it comes, here are the comparisons:

Delphi: 8,6 Seconds (first run 10secs)
Java 1.6: 1 Min 40 Seconds!! hotspot was on I think
c# : 9 Seconds
purebasic: 8 seconds
Blitzmax: 16 seconds

And that's what I don't understand, my guess would have been that pb and BMax should have been equal.
On a side not but irrelevant for the results, I created a nice form for delphi,Java and c#.
And yes, the debugger was deactivated.

And please, this little benchmark is only a little test for me to check out the math speed and the implementation in different languages and has nothing to do with the practical speed of a whole application., so no flames about better languages or so.

I only want to know why BMax needs 16 seconds where pb only needs 8. Where can I optimize. I used that same formula in every other language.

Best regards
B.


Azathoth(Posted 2007) [#2]
Might not have anything to do with the speed but you realise you're only making the last variable on the Global lists Long and Double?

Are you also using PureBasics quad type which would be equal to BlitzMaxs Long?


ziggy(Posted 2007) [#3]
Is there any reason for the X variable to be an integer? It works faster when it is declared as Double, as the Y variable, this way, the test makes much more sense to me.

I've converted the code to Visual Basic .net and it took 9 seconds with the debug on, and 93 milliseconds running with the program built and no debuger present. So I supose you tested C# from the visual studio IDE over a debug build, try it over a release build directly from windows, and I supose you'll get a much better performance.

Sample code (the program has a form with a button and a textbox)
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim Anzahl As Integer, Iterationen As Integer, EndTime As Long
        Dim StartTime As Long
        Dim i As Long
        Dim DauerMsec As DateTime
        Dim x As Double, y As Double
        Anzahl = 10000
        Iterationen = ((Anzahl - 1) * Anzahl / 2)

        EndTime = Now.Ticks

        For i = 0 To Iterationen
            y = Math.Sin(i)
            x = Math.Cos(y)
            x = x * y
        Next
        StartTime = Now.Ticks

        DauerMsec = New DateTime(StartTime - EndTime)

        SPrint("Anzahl: " & Anzahl)
        SPrint("Iterationen: " & Iterationen)
        SPrint("Time: " & DauerMsec.ToString & " " & DauerMsec.Millisecond)

    End Sub
    Sub SPrint(ByVal Str As String)
        Me.TextBox1.AppendText(Str & Chr(13) & Chr(10))
    End Sub
End Class


(I thought .net would be slower but oh surprise!)


tonyg(Posted 2007) [#4]
Setting SuperStrict and declaring the variables reduced my initial 10s to 7s. I aso saved a couple of secs if 'i' was declared as int.


Grisu(Posted 2007) [#5]
8 secs here without any optimisation of your example code given.

If one codes dirty one can't expect awesome results.


Czar Flavius(Posted 2007) [#6]
Why should declaring them matter? If it assumes that they are integers anyway, as far as the exe is concerned it's the same?


ziggy(Posted 2007) [#7]
Superstrict seems to activate some optimizations in the generated exe, but it forces to explicity declare any variable. That makes your code easier to debug and mantain.

I can't see any reason not to use it.


Barbapapa(Posted 2007) [#8]
sorry guys but you can't compare the time because you don't have the comparison codes. So, if I would run my code on my new pc it would probably only take 2 secs or less. These are my testing specs: winxppro sv2 Pentium 4/2.67 GHz and 1GB Ram

@ziggy: for c# or vb you have to declare and define AND actually use these variables or else it will get optimized, meaning it won't calculate at all. I got the same result as you the first time, then I added an editfield which showed me the result of x after the loop. C# was the only language optimizing in such a way.
Edit: Why are x,y integers? They are doubles at least should be -> Azathoth

@Azathoth: true, changed it, now it takes 7,54 secs with pb....

and yes, hopping to so many languages let me tripple about it, in BMax I have to declare each type separately ;) thanks for pointing this out. It's only in BMax that way. Now it takes only 11,67 secs.

Here the new code.
SuperStrict

Global Anzahl : Long,Iterationen : Long,EndTime : Long,StartTime : Long,i : Long
Global  DauerMsec: Double,x : Double,y : Double

Anzahl  =  Long ( Input("Anzahl Clients: "))
Iterationen = ((Anzahl-1)*Anzahl/2)

EndTime = MilliSecs()

For i = 0 To Iterationen
	y = Sin(i)
	x = Cos(y)
	x = x*y
Next
StartTime = MilliSecs()
Print "Anzahl: " + String(Anzahl)
Print "Iterationen: " + String(Iterationen)

DauerMsec = StartTime - EndTime

Print "Millisekunden: " + String.FromDouble(DauerMsec)
Print "Sekunden: " + String.FromDouble(DauerMsec/1000)


@tonyg: i must be as big as iterations

@Grisu: helpful

So BlitzMax is getting nearly as fast as c#. Now I would like to know why. Just to understand the 'under the hood'. Always eager to learn... (I still would have bet that pb and BMax would be equally fast in math)


dmaz(Posted 2007) [#9]
I probably don't have to ask this but since you didn't explicitly say it... you did compile in release mode, correct?


Dreamora(Posted 2007) [#10]
the reason BM is slower is the outdate MingW it currently builds on which is 3+ years old. (P2 optimations only!)
If you replace it with a newer mingw, enable the correct compiler flags (P3 at least) and rebuild the modules, the calculation speed will raise by 50-150% depending on the operation.


tonyg(Posted 2007) [#11]
@tonyg: i must be as big as iterations

I changed iterations as well. You're right our times are irrelevant which is why I focused on the time saved. Declaring variables is a *good* thing.
It'd be interesting to see where each program spends its time in case it's a specific command in Bmax which takes the extra 1-2s.
Having said that I, personally, wouldn't worry about a 3s difference or would factor it in with other reasons to use a language.


xlsior(Posted 2007) [#12]
...enable the correct compiler flags (P3 at least)


How exactly do you do that?

I have the latest MinGW running (works with everything other than MaxGUI) but have no idea how to enable different compiler options.


Brucey(Posted 2007) [#13]
How exactly do you do that?

According to "man gcc" (haw)...

-march=pentium3


xlsior(Posted 2007) [#14]
I meant: how/where do you pass *any* flags to MinGW?

(I've just used the 'Build Modules' through the IDE so far)

Also -- using the Pentium3 flag, does that optimize the execution speed for Pentium 3 and above, or does it render the resulting program completely unusable on a Pentium 1 or 2?


Who was John Galt?(Posted 2007) [#15]
@Dreamora - have you tried this?

As I understand it, Max does its own compilation, spitting out assembly that is compiled by FASM. I guess sin, cos in this example may be C functions, compiled by MingW? Anyone know?


ziggy(Posted 2007) [#16]
All C or C++ code in the modules is compiled using MinGW, this will apply not only to Sin, Cos, etc. but also to the GC, and some other areas. Using a not officially suported version of MinGW may have side effects (maxgui is not compilable with current latest release) but it seems to improve programs execution speed and it reduces drastically calculation rounding errors.


Barbapapa(Posted 2007) [#17]
Great news, I read somewhere that Mark wanted to replace the old MingW with the new one. Don't know how long it'l take. But besides that it's still remarkable that c# is so fast. Btw Delphi as I read hasn't got a compiler update for a long time too.
Question, does c# when doing a JIT, compile for the special cpu it's running on, or does it 'only' have a standard compiler for all cpu?
On a side note, PB seems to have a little bug with quad types. In my example (not a bug, but a strange missing feature) I can't use quads for For loops, only longs (4byte).

Anybody uses Java here? I have a strange feeling about the result, I mean 1min40??? crazy. There's got to be a way to optimize it to something comparable to the other results!


Pragun(Posted 2007) [#18]
java wastes a LOT of time if you give it a lot to print out. it works much faster if you only print out at the end.


Barbapapa(Posted 2007) [#19]
that's what I did,debugger out too. Will have to investigate further.


xlsior(Posted 2007) [#20]
Yeah, Mark said that the next blitzMax update will work with the latest MinGW... that doesn't say anything about what additional optimizations such as the 'pentium3' flag may or may not be enabled by default. :-?


ImaginaryHuman(Posted 2007) [#21]
Change your globals to all locals and see it fly. When you use Global you explicitly are forcing the variables to be stored in memory rather than be optimized into local registers.


Barbapapa(Posted 2007) [#22]
11.58 secs - no fly- hard landing ;)


ziggy(Posted 2007) [#23]
Is it faster with this code:
SuperStrict
Import "sinandcos.c"
Extern
	Function Sin:Double(v:Double) 
	Function Cos:Double(v:Double) 
End Extern
Global Anzahl : Long,Iterationen : Long,EndTime : Long,StartTime : Long,i : Long
Global  DauerMsec: Double,x : Double,y : Double

Anzahl  =  Long ( Input("Anzahl Clients: "))
Iterationen = ((Anzahl-1)*Anzahl/2)

EndTime = MilliSecs()

For i = 0 To Iterationen
	y = Sin(i)
	x = Cos(y)
	x = x*y
Next
StartTime = MilliSecs()
Print "Anzahl: " + String(Anzahl)
Print "Iterationen: " + String(Iterationen)

DauerMsec = StartTime - EndTime

Print "Millisekunden: " + String.FromDouble(DauerMsec)
Print "Sekunden: " + String.FromDouble(DauerMsec/1000)


sinandcos.c source:
#import <math.h>
double Sin(double v) {
	return sin(v);
}
double Cos(double v) {
	return cos(v);
}

It seems to be much faster on my computer this way.


Barbapapa(Posted 2007) [#24]
Great idea ziggy! I gave it a test and it took 10.069 seconds, so it actually is a little faster. Don't know what's different with Marks code, shouldn't he use the math lib too? But due to the same MingW it can't really be the big speed boost. I'm very eager on trying this with the newest version of MingW. Hmm shouldn't it be possible to try this little source out with the new MingW ? Chances are it could even work...

Edit: I installed the newest MingW but didn't do a complete rebuild but did use ziggys version so the .c code should have normally been compiled with the new version. Helas, there wasn't any speed change. I'm afraid of rebuilding all modules, would that change anything?


Barbapapa(Posted 2007) [#25]
Got a new optimization :) the variable 'Iterationen' is causing the huge time increase. Seems that in BMax or let's better say MingW, the values are not clamped. I introduced a new variable as j:byte and in the loop added a j=i. Now I have a speed increase to 9 seconds. Having a sin(xx millions) is not very realistic ;) oh and sin() is slower than cos().


ziggy(Posted 2007) [#26]
Yes, it will make better rounding when dealing with decimal precission numbers, and a little (and sometimes not so little) speed increase.

My Gess is that it is faster becouse C function is not managed code, and the internal variables of the function don't have to deal with the GC. I'm not much sure about this, but it seems logical to me.


Barbapapa(Posted 2007) [#27]
just a little sidenote, I managed to drop the Java run time from 90 seconds to 9!!! seconds by clamping the values between -PI/4 and +PI/4. The same test ist BLitzMAx didn't result in any speed increase, stays at 10,6 seconds. Funny ey?? somebody from a Java forum tried values too and came up with a funny result, to keep the speed high, you have to keep away from x^10 for sin and cos. Didn't try this though but very interesting.

SuperStrict

Local Anzahl : Long,Iterationen : Long,EndTime : Long,StartTime : Long,i : Long
Local  DauerMsec: Double,x : Double,y : Double
Local jo1:Double = -1*Pi/4
Local jo2:Double = Pi/4

Anzahl  =  Long ( Input("Anzahl Clients: "))
Iterationen = ((Anzahl-1)*Anzahl/2)

Local  jo3:Double = 2*jo2/Iterationen;
Local j:Double=jo1; 

EndTime = MilliSecs()

For i = 0 To Iterationen
	y = Sin(j)
	x = Cos(j)
	x = x*y
	j=j+jo3;
Next
StartTime = MilliSecs()
Print "Anzahl: " + String(Anzahl)
Print "Iterationen: " + String(Iterationen)

DauerMsec = StartTime - EndTime

Print "Millisekunden: " + String.FromDouble(DauerMsec)
Print "Sekunden: " + String.FromDouble(DauerMsec/1000)



Who was John Galt?(Posted 2007) [#28]
So blitzMax is slower than Java in this test? That is shocking.


Barbapapa(Posted 2007) [#29]
Not so shocking, only in this particular case. If you take my first code with huge values for i Java will crawl again, but then no one will ever need such high values for sin and cos ;)

It's all about optimizing, but I think nowadays nearly every language is fast enough it's more a 'what language is the most usefull for this project?'. That's why I used Blitz3D for an architectural walkthrough but will probably use Processing for my next project which will need good networking stuff, no fullscreen, easy 3d, but lots of math and easy connections between many clients and a server and easy plattform portabiliy. I may eventually change later maybe, but for rapid prototyping Processing is fantastic. It's like and nice and easy framework with all Java power underneath if you need it. And working pure OOP is always beneficial. But would I do a full game with Processing or Java? no, then I would personally use BlitzMax probably.


Dreamora(Posted 2007) [#30]
Whats preventing you from using Java ... JRenderMonkey and the like are good alternatives for doing 3D in java if you are happy enough with it :)


Barbapapa(Posted 2007) [#31]
complexity is preventing me from using Java, that's why Processing is so great. And BlitzMax is great too, not for every case, but for indie games it's perfect, I like c# too, it's cool for GUI intensive apps, Delphi is very nice too if you're aiming for an all in one exe without .net, Blitz3D is cool too, for doing nice 3D quick and easy. And not to forget GameMaker ;) yes I really like it for some fun stuff. I like having choices, don't you too?? And as speed is not such a decision factor anymore one is free to jump around.


Boulderdash(Posted 2007) [#32]
why dont you disassemble the executable code from each test, its the resulting executing machine code that matters.


Barbapapa(Posted 2007) [#33]
Java has a virtual machine, so no exe, like c#, only bytecode. The speed difference isn't so big anymore, it's just that Java has very optimized native code for some sin/cos ranges (-PI/4 to PI/4 and has big problems with x^10 values. PB seems to have it's own libs thus being the fastest and BlitzMax relies on the MinGW libs (I think). But speed is ok for all languages in most cases. Disassembling would be cool and informative for the exe versions though, but I'm not an assembler geek, so I'll leave this to Mark and co ;)


FlameDuck(Posted 2007) [#34]
You realize of course that the problem with this kind of "benchmark" is that the iteration loop is using up a non-trivial amount of the overall time, and it's prone to compiler cheats. Try a Whetstone test, or if you want a more direct comparison, I believe that Jim Brown has a collection of Sieve of Eratosthenes written in various languages to offer a better comparison.


Barbapapa(Posted 2007) [#35]
well I know that my little benchmark is more a private and very unofficial thing. It was pure curiosity that let me translate it from delphi to some other languages and I wanted to know if I would be right with my guesses. I was wrong in many cases, shocked by the first implementation in Java, surprised by the speed in c#, BlitzMax let me wounder, being the 'slowest' one, but then it's still fast enough. I think that's even my resumee, in most cases all (tested) languages are fast enough. An implementation in c++ would be funny though, but I'm no friend of c++ ;) and I don't like to install another MS express. My guess is, the result would be between 7.5 and 11 seconds ;)