Bank and array speed test

BlitzMax Forums/BlitzMax Programming/Bank and array speed test

sswift(Posted 2009) [#1]


Just a speed test I did to see what would be the best way to modify a lot of image data in my texture generator.

The short of it is:

Arrays and banks are the same speed.

It is no better to pull ints from an bank than bytes, provided you don't need to increment the pointer after each byte, and instead can just use a different index as in my second example.

Peek is ten times slower than using array style indexing to access the data. I'm not sure why it doesn't simply get optimized out.

Using pointers to index the data should be faster but it seems the index is added when derefencing even if it is zero. At least, my understanding from coding in assembly years ago is that Buffer[10] should not be as fast as using Buffer[0] all the time, because Buffer[10] would add 10 to the Buffer pointer, whereas Buffer[0] should need to add nothing.

But perhaps this is simply a case where the index is stored in a register and on modern PC's adding a register to a pointer in another register is so fast as to be instant. Maybe it's a ulti core thing. I dunno.

All I know is there's no speed difference, and you could just as eaily use a while loop with a pointer or a for loop with an index to access the data in a buffer and they both would be equally fast.


N(Posted 2009) [#2]
Peek is ten times slower than using array style indexing to access the data. I'm not sure why it doesn't simply get optimized out.
Because by using PeekInt, you're making a function call and a method call to do the same thing as buf[Index]. However, it doesn't help that you're using PeekInt incorrectly.


sswift(Posted 2009) [#3]
Because by using PeekInt, you're making a function call and a method call to do the same thing as buf[Index].


I am well aware that PeekInt is a function call.

Allow me to rephrase the question:

Why isn't the compiler replacing the function call with the faster method of accessing the memory?


However, it doesn't help that you're using PeekInt incorrectly.


If you're going to tell me I'm doing something incorrectly, at least have the courtesy to explain what exactly it is that you think I'm doing wrong.


joncom2000(Posted 2009) [#4]
I dont know if it would make much difference in the grand scheme of things but I notice that you use the bank.buf() method to get a pointer to it, yet you use the wrapper function peekint(bank,offset) when you could use bank.peekint(offset) thus reducing a call to a function.

Also on your array creation, why create a byte array to store int's then have to int ptr the byte ptr, why not just make an array of int's in the 1st place ? I know your 1st array test access'd by byte but you didnt do a peekbyte test on the bank?

Actually you could skip the whole bank test and just memalloc and use a pointer and offset to it to save on the whole bank type and method overhead if speed is really an issue since its just a nice wrapper around that anyway.


ImaginaryHuman(Posted 2009) [#5]
In general, reading integers from memory is faster than reading any other type of data including the same number of bytes. But if that means you have overhead of having to extract bytes from the integer, that will probably eat up the benefits.

Also I have found that allocating some memory and then using pointers can be a little bit faster than using arrays or banks. I didn't get a major increase, though, only a few percent difference. The thing is that when you access an element from an array, it has to go to the array *object*, find the base address of the array *memory*, and then index it, whereas with plain memory you already are keeping the base address in a register and then just have to index it. But still, most of the bottleneck is in the memory transfer rate/bus/cache speed etc.

I know what you mean about indirect addressing in assembler optimizing to use a faster addressing mode - maybe the blitz/gc compiler just doesn't do that optimization for some reason.

btw reading/writing doubles can be faster than integers, but then you'd have to use like varptr to get the base of the double data and then read off the bytes.

You might find that you get more optimizations elsewhere in your algorithm because if you're doing image manipulations I presume there are hefty chunks of code that could be made faster.


N(Posted 2009) [#6]
If you're going to tell me I'm doing something incorrectly, at least have the courtesy to explain what exactly it is that you think I'm doing wrong.
Ordinarily I would, but you're different. You don't really seem to make any attempt to learn something on your own, which is eery in its own right considering you've somehow managed to ship an application, several modules, and a bad game.

That being said, you're not using the PeekInt method, you're using the procedural wrapper of the method. Additionally, if you want proper access to a bank that isn't going to break, you're also going to have to lock the bank before getting the buffer (you're not doing that either). In short, you're just using the method incorrectly, and you're using the bank incorrectly.

Why isn't the compiler replacing the function call with the faster method of accessing the memory?
Why do you think it would?


plash(Posted 2009) [#7]
Why do you think it would?
Also.. how in all bliss would it know?


sswift(Posted 2009) [#8]
I dont know if it would make much difference in the grand scheme of things but I notice that you use the bank.buf() method to get a pointer to it, yet you use the wrapper function peekint(bank,offset) when you could use bank.peekint(offset) thus reducing a call to a function.


I didn't notice banks had a peekint method. It's worth a try I suppose.

Also on your array creation, why create a byte array to store int's then have to int ptr the byte ptr, why not just make an array of int's in the 1st place ? I know your 1st array test access'd by byte but you didnt do a peekbyte test on the bank?


It shouldn't make a difference. An array is a block of memory. Declaring it as an int array or a byte array will only affect how you index it and how much memory it takes up. Converting the byte pointer to the byte array to an int pointer to the byte array, is just a way of telling the compiler to access the data as if it were ints.

And I just did it that way in my code because I was too lazy to declare another array. :-)



You might find that you get more optimizations elsewhere in your algorithm because if you're doing image manipulations I presume there are hefty chunks of code that could be made faster.



Well with the indexing method of accessing the banks alone, Seamless Texture Generator written in BlitzMax is gonna be 10x faster than it was in BlitzPlus. And if I decide to unroll any loops we could be talking 100x faster based on earlier tests. Fast enough, perhaps, for realtime previews.

As for optimizing the algorithms, those are already very optimized. I have one of the fastest median algorithms in there, and a very fast box/gaussian blur.

ctually you could skip the whole bank test and just memalloc and use a pointer and offset to it to save on the whole bank type and method overhead if speed is really an issue since its just a nice wrapper around that anyway.


I didn't even notice BlitzMax had Memalloc. But looking at the source to TBank, it doesn't look like there'd be any overhead associated with writing via pointer dereferencing so there's not really a point in switching over to that. Other than maybe making things clearer should I ever choose to port to C. :-)


You don't really seem to make any attempt to learn something on your own, which is eery in its own right considering you've somehow managed to ship an application, several modules, and a bad game.


And you don't really seem to make any attempt at being polite.

Noel, don't presume that you know anything about me. Because I assure you, you know NOTHING about me.

Allow me to educate you with a bit of the light reading I've been doing over the last two weeks while researching the algorithms I'll need for my new texture generator:

http://en.wikipedia.org/wiki/Color_models
http://en.wikipedia.org/wiki/YCbCr
http://en.wikipedia.org/wiki/RG_Chromaticity
http://en.wikipedia.org/wiki/HSL_color_space
http://en.wikipedia.org/wiki/Opponent_process
http://en.wikipedia.org/wiki/Lab_color_space
http://en.wikipedia.org/wiki/Seam_carving
http://www.seamcarving.com/arik/imret.pdf
http://www.shellandslate.com/download/fastmedian_5506.pdf
http://www.cs.princeton.edu/gfx/pubs/Barnes_2009_PAR/patchmatch.pdf
http://en.wikipedia.org/wiki/Bilateral_filter
http://people.csail.mit.edu/sparis/publi/2006/tr
http://people.csail.mit.edu/fredo/PUBLI/Siggraph2002/DurandBilateral.pdf/Paris_06_Fast_Bilateral_Filter_MIT_TR.pdf
http://groups.csail.mit.edu/graphics/ibedit/ibedit_s2001_cameraReady.pdf
http://artis.imag.fr/Publications/2004/ED04/flash.pdf
http://people.csail.mit.edu/sparis/bf_course/slides/07_variants.pdf
http://www.gamedev.net/reference/programming/features/edf/page2.asp
http://en.wikipedia.org/wiki/Sobel_operator
http://en.wikipedia.org/wiki/Band-pass_filter
http://en.wikipedia.org/wiki/Albedo
http://research.microsoft.com/en-us/um/people/zhang/papers/iccv01-reflectance.pdf
http://www.cs.adelaide.edu.au/users/mjb/Papers/Light_source_direction_from_a_single_image.pdf
http://perception.inrialpes.fr/people/Prados/demos/SFS/demo_real_images/demo_real_images.html
http://dspace.mit.edu/bitstream/handle/1721.1/7420/CS010.pdf?sequence=1
http://www.vision.cs.ucla.edu/papers/pradosJS09.pdf
http://srv-43-200.bv.tu-berlin.de/~vr/papers/acrobat/CAIP93.pdf
http://sibgrapi.sid.inpe.br/col/sid.inpe.br/banon/2002/11.04.11.11/doc/7-14.pdf
http://gandalf.psych.umn.edu/users/schrater/schrater_lab/courses/CompVis09/Papers/Lec05Shading.pdf
http://www.loria.fr/~kerautre/publications/Kerautret05b.pdf
http://www.if.pwr.wroc.pl/~optappl/pdf/2008/no2/optappl_3802p387.pdf
http://research.microsoft.com/en-us/people/yasumat/isfs_cvpr05.pdf
http://pc2.iam.fmph.uniba.sk/amuc/_contributed/algo2009/breuss.pdf
http://www.comp.leeds.ac.uk/bmvc2008/proceedings/1993/bmvc-93-015.pdf
http://web.mit.edu/bcs/sinha/papers/sinha_adelson_ICCV93.pdf
http://www.cs.huji.ac.il/~peleg/papers/pami90-Peleg-Rom.pdf

I don't know how you got the silly notion in your head that I don't have a desire to learn. I'll have you know that when it comes to computer programming I'm completely self-taught. And I don't even post questions here that often. I suspect that you've mistaken having priorities with an aversion to learning.


That being said, you're not using the PeekInt method, you're using the procedural wrapper of the method.


You are correct, I am using the function isntead of the method.

Now kindly explain to me how that translates to "You're using the peekint function wrong."

Perhaps you meant to say "You are using the peekint function instead of the peekint method."


Why do you think it would?


Because compilers are supposed to optimize things. Have you ever even used a compiler besides Blitz, Noel? Have you ever looked at the aseembly code they output? Real compilers do optimizations like unrolling loops for you and stuff.

http://en.wikipedia.org/wiki/Compiler_optimization


Also.. how in all bliss would it know?


Oh come on. Are you guys being intentionally obtuse?

Replacing A = PeekInt(B, C) with A = B[C] is not a complicated procedure.


Brucey(Posted 2009) [#9]
Arrays and banks are the same speed.

No surprise there then.

If they were a different speed I'd be a bit concerned...


sswift(Posted 2009) [#10]
Brucey:
The way multi-dimensional arrays in BlitzMax are treated as arrays of arrays concerned me about hidden overhead with how they're dealt with internally. That's why I performed that test.


N(Posted 2009) [#11]
The way multi-dimensional arrays in BlitzMax are treated as arrays of arrays concerned me about hidden overhead with how they're dealt with internally. That's why I performed that test.
Multidim arrays are not arrays of arrays, they're a solid block of memory (with the usual array header). An array of arrays is an array of arrays.


plash(Posted 2009) [#12]
Oh come on. Are you guys being intentionally obtuse?
Replacing A = PeekInt(B, C) with A = B[C] is not a complicated procedure.
Not all procedural wrapper functions are exact calls of <obj>.<method>.

It would be ludicrous and so very wrong if the compiler even attempted to do what you suggest.
Also, procedural wrapper functions can also own or disown parameters that the method being wrapped does not or does have.


sswift(Posted 2009) [#13]
Not all procedural wrapper functions are exact calls of <obj>.<method>.


I'm not sure how "<obj>.<method>" has anything to do with what I wrote.


It would be ludicrous and so very wrong if the compiler even attempted to do what you suggest.


I think that's going a bit far.

Sure, if you want to be picky, the code which is actually executed when you call PeekInt is:

Method PeekInt( offset )
		Assert offset>=0 And offset<_size-3 Else "Illegal bank offset"
		Return (Int Ptr(_buf+offset))[0]
	End Method


And if you were to simply replace that with my code you'd lose the protection provided by Assert.

But one could put up a pretty good argument for there not being a condtional jump in a function which is going to be used in situations where speed is of the essence. That Assert call should have a conditional compiler directive so it is included when running in debug mode only.

And if you remove that, then the function call simply boils down to:
Return (Int Ptr(_buf+offset))[0]

Which is the same as the code I suggested. With the odd exception of the offset being added to the pointer rather than being used as the index for dereferencing. What the heck is up with that?


Also, procedural wrapper functions can also own or disown parameters that the method being wrapped does not or does have.


You are speaking in hypotheticals. We are talking about the peek functions here, not every wrapper function in Blitzmax.


Brucey(Posted 2009) [#14]
how they're dealt with internally

Since all the source code is provided with/for BlitzMax, you could always examine that and see for yourself how arrays are handled :-)


N(Posted 2009) [#15]
What the heck is up with that?

Because this
Int Ptr(buffer)[offset]

is not the same as this
Int Ptr(buffer+offset)[0]



sswift(Posted 2009) [#16]
Multidim arrays are not arrays of arrays, they're a solid block of memory (with the usual array header). An array of arrays is an array of arrays.


My bad.


sswift(Posted 2009) [#17]

Because this

Int Ptr(buffer)[offset]

is not the same as this

Int Ptr(buffer+offset)[0]



You got me again. You're two for two.


sswift(Posted 2009) [#18]
Since all the source code is provided with/for BlitzMax, you could always examine that and see for yourself how arrays are handled :-)


What do you mean? I see source to the mods and things, but I don't see any source for the internals of the BlitzMax compiler?

*searching for .c files*

Oh, I see some in brl.mod. But there's not many. Hm. There's an array source file. Can't make heads or tails of it. Looks like code for a compiler... No wait, those are case statements for each data type.

Hm. Well I'll have to look at this more to really figure it out, but if all the source code's here, where's the source file that has the END function? I'd like to take a look at what that does so I can see what might have been causing the long delay when you quit playing Space Beetles.

[edit]

Hm, I think I found it, but all it does is call Exit(0)? I can't find the exit function though. It may be a C command. If so then now I'm confused how my app could have been locking up on end.

Anyway this is getting way off topic.


N(Posted 2009) [#19]
I can't find the exit function though. It may be a C command. If so then now I'm confused how my app could have been locking up on end.

Off topic, but this is the man page (Mac OS, anyway, doubt it differs much between platforms):



Mahan(Posted 2009) [#20]
A comment regarding compiler optimizations (inline):

Some compiled languages (like C++) provide the option to mark functions as "inline" to instruct the compiler to instead of making a call, actually include the body of the function everywhere it is used (thus removing the time to shuffle around the stack and make the call of course).

This has some similarity with preprocessor macros in pure C but inlining is much more potent and advanced and part of the actual compilers work.

Now the C++ language has a special keyword (inline) so that the developer can control this behavior manually, as the compiler (at compile time) usually does not have enough information about the bottlenecks in an application to know what functions should be inlined. Obviously not everything can be inlined (think about recursive functions, as an extreme example.)

Afaik, BlitzMax does not have inline functionality at all.

On a sidenote: In the Java Virtual Machine (JVM) there is JIT (Just In Time compiling) that does runtime compilation. The JVM has the ability to monitor and locate bottlenecks in runtime and to use JIT for inlining code and even bypass OOP call-chains (as used when overriding functions in inheritance) based on knowledge that the JVM gathers while the program is running, that simply isn't available before the program is executed. Modern Java sometimes outperforms C++ in server-code (long running, heavy processes) today because of this.

edit: some grammar


ImaginaryHuman(Posted 2009) [#21]
>Multidim arrays are not arrays of arrays, they're a solid block of memory (with the usual array header). An array of arrays is an array of arrays.

>My bad.

Actually when I looked at the code for arrays I was under the impression that for a multidimensional array there is a separate block of memory for each `row` of the array in a given dimension. So for a 2D array [10,20] you'd have 10 blocks of memory 20 bytes long, or whatever. So it IS kind of like an array of arrays, just that the length of each array is the same.


_Skully(Posted 2009) [#22]
Mahan,

So basically inline just means copy the raw code here... providing a means to centrally manage the code (instead of finding all occurances) but removing the need for the function call itself.

Hmmmm... sounds like good optimization stuff...


Chroma(Posted 2009) [#23]
Nilium (Noel) is an odd character. He's been on this site desperately seeking recognition and respect for as long as I can remember but he don't get it. It's actually quite comical.


N(Posted 2009) [#24]
Chroma (Noel) is an odd character. He's been on this site desperately trying to make a flight sim for as long as I can remember but he doesn't get it. It's actually quite comical.


slenkar(Posted 2009) [#25]
C'mon Noel you have been rude to SSwift. Apologize to him now lad, or no dinner for you.


N(Posted 2009) [#26]
You're right, that was excessively rude on my part. Apologies, sswift - I'll try to be less caustic and not accuse you of something I can't prove next time.


Chroma(Posted 2009) [#27]
Nilium, Noel, or whatever your nick is this week. I've been on these forums a long time and I know for a fact that sswift is a ten times better programmer than you are. You remind me of a 2d Lt that goes around saying "hey, I'm an officer!"...and then everyone laughs at him and treats him like crap. Bottom line is, you have to give respect to get respect. You were disrespectful to sswift for absolutely no reason except to try and look like some coding guru with all the answers. In actually you came out of it looking like an idiot. Sswift has released a game. He completed something. It's for sale on a website. What have you done?

Go write vein 3 and then post the bloated non-working code in the archives. Then when someone replies that it doesn't work you can say "Sorry I can't be bothered to fix it". It seems to be your forte. Or better yet...just start posting all your poison on BlitzMonkeys.com!!


N(Posted 2009) [#28]
Nilium, Noel, or whatever your nick is this week. I've been on these forums a long time and I know for a fact that sswift is a ten times better programmer than you are. You remind me of a 2d Lt that goes around saying "hey, I'm an officer!"...and then everyone laughs at him and treats him like crap. Bottom line is, you have to give respect to get respect. You were disrespectful to sswift for absolutely no reason except to try and look like some coding guru with all the answers. In actually you came out of it looking like an idiot. Sswift has released a game. He completed something. It's for sale on a website. What have you done?

Go write vein 3 and then post the bloated non-working code in the archives. Then when someone replies that it doesn't work you can say "Sorry I can't be bothered to fix it". It seems to be your forte. Or better yet...just start posting all your poison on BlitzMonkeys.com!!
Let me address this point by point:

1) My name really doesn't change that often.

2) I don't know about sswift's skills, and it was wrong of me to judge him as harshly, rudely, and stupidly as I did. Unfortunately, I shot first and asked questions later, and that hardly ever works out in one's favor. I unfortunately have a somewhat bad history of doing this, but I try to refrain from doing it as often as I once did.

3) Wouldn't know, I've not been in the military. Doesn't really work as an analogy here, since I don't get it.

4) In regards to respect, not necessarily. I respect people who have given me none in return, and vice versa. It sounds like a nice ideal to give respect and get respect in return, but that's not how reality works.

5) No, I was disrespectful to sswift because after numerous posts of his, most of them displaying what I perceived as him showing very little effort on his part to understand a topic, I felt like he was just being annoying.

6) Yes, I did come out of it looking like an idiot. As I said, I was excessively rude. There was really no genuinely good reason for this, just me being annoyed.

7) He released a bad game. I don't take that back, it really just isn't fun. Kudos to putting something out, but I don't see much point in respecting someone for putting out something that isn't fun and failed to sell.

8) I've completed a few different things and been paid for them (and am being paid now for some work to do with BMax). I don't sell what I make because I would rather share those things with people than charge them for it. The way I see it is that if even one person finds something I made useful, even if it's just one line of code that explained something, then that's good enough. At any rate, I won't discuss exactly what I'm doing now for contractual reasons (in other words, I'd rather sit on the safe side of the confidentiality clause).

8) Vein 3 was mostly functional, and what pieces of it I released (e.g., the GUI) worked. Not sure what you're on about there. There's plenty of other code in the code archives for Blitz3D, but I really can't be bothered to fix it because I no longer have Blitz3D. Some other pieces of code are dated and I don't use them anymore, so I'm not going to fix those either. If you really want up to date stuff, you get it off my github repo.

9) What does BlitzMonkeys have to do with this? O_o

Now, if you want to come out looking like the white knight in this, there you go, you can have the title. I don't care, but really, holding grudges as you seem to be doing isn't going to do you or me any good.


Mahan(Posted 2009) [#29]
@_Skully


So basically inline just means copy the raw code here... providing a means to centrally manage the code (instead of finding all occurances) but removing the need for the function call itself.



Yes, you could say that, but in the C++ implementation you actually define the function as inlined and that means that every time you "call" this function, in reality a copy of the function code is inserted. (instead of specifying inline when you make the call)

Still you get to keep lots of benefits of a normal function as the inline function can have local variables, params and return value just as any other function.

Why i wrote this about the inline stuff is because sswift wrote this:

Why isn't the compiler replacing the function call with the faster method of accessing the memory?



The answer is that I don't think BlitzMax has the concept of inlining as an optimization.

Besides without an inlining directive (or clearly specified levels of optimization) the compiler would imho do wrong by optimizing away pure function calls by itself.

I don't know of a single C/C++ compiler that implements advanced optimizations like inlining or loop-unrolling etc, that does not provide optimization-levels where these advanced optimization techniques are possible to disable. (example: gcc compiler -o[x] directive where x is the level of optimizations used)


slenkar(Posted 2009) [#30]
well noel apologized so let bygones be bygones


Arowx(Posted 2009) [#31]
So when should I be using banks and when should I use arrays?


slenkar(Posted 2009) [#32]
always use arrays?

I cant think of any advantages of banks


sswift(Posted 2009) [#33]
Merx:
It depends on the situation.

Banks are probably good if you intend to store a lot of data you need to save or transmit.

Arrays are generally fine if you need a fixed size block of memory and you're not doing anything particularly intensive with it. And if you want to access 2D data at random an array makes that simpler and using a bank wouldn't really be any faster.

But I'm leaning strongly towards using memalloc instead of banks in situations that call for storing, copying, and modifying very large chunks of memory such as when processing images. I don't need the extra features of banks, and getting access to the bank pointers is just another level of abstraction. Why store a pointer to a bank when every time I need to access it I'm gonna have to dereference that to a float pointer?


N(Posted 2009) [#34]
I'm learning strongly towards using memalloc instead of banks in situations that call for storing, copying, and modifying very large chunks of memory such as when processing images.
I would recommend this route, especially if you're frequently allocating and deallocating buffers. Using arrays in this situation can put a fair amount of strain on the GC (I tried this while writing my JSON parser, and using arrays as buffers slowed parsing to a crawl due to the number of arrays that had to be created and resized - using MemAlloc and such just worked better).