Optimizing image operations

BlitzMax Forums/BlitzMax Programming/Optimizing image operations

_JIM(Posted 2009) [#1]
Hi!

I'm working on a pretty cool effect (which I will post in the code archives soon enough) and I was wondering if I can speed it up a little bit.

Basically, what I am doing is doing 2 x ReadPixel and 1 x WritePixel for every pixel in an image.

One of those images is static, the other one changes position (so I apply an offset to the pixels that I read).

I tried speeding it up by creating an array that holds some extracted info from the static image. Sadly, accessing 2 float arrays proved to be slower than just reading those pixels.

So, does anyone know any trick to speedup those read/write pixel functions?

Also, is there another way to store some data that I can access faster than a ReadPixel?


ImaginaryHuman(Posted 2009) [#2]
Hey Mr JIM,

If you're accessing pixels in a pixmap, there are faster ways to do it than to use the usual functionality.

For example, ReadPixel() and WritePixel() are standalone pixel accessing operations, which means if you were doing lots of other non-pixel code and then you had this `one-off` need to read a pixel, it would work well because ReadPixel will take the base address of the pixmap, multiply the y coordinate by the row pitch of the pixmap, and then add the x coordinate, and then read off the integer. That's fine for one pixel by itself. But now when you are reading lots of pixels it's almost like you're starting over every time when you could much more efficiently be keeping track of where you are in the pixmap memory, especially if you're reading pixels in sequence, and even more particularly if there are no padding bytes at the end of each pixmap row.

So I would stop using ReadPixel and WritePixel. Instead, use integer pointers to read off and write whole pixels (RGBA format - 4 bytes per pixel - no row padding). You can start off simply setting your pointer to the base pointer that you want to start at (or use PixmapPixelPtr(pix,x,y) once outside the loop). Similar for the second pixmap. Then you can just start doing like:

local sourcepixel:int=sourcepointer[0]
local destpixel:int=destpointer[0]
local combined:int=sourcepixel+destpixel ' or whatever
destpixel[0]=combined

It then depends on whether you want to use FOR loops or While loops as to whether it'd be better to reference [0] and add 1 to the pointers, or to reference [index] and let the loop count for you. I find While loops to be a bit faster sometimes but maybe that's just me.

Make sure you use as many local variables as possible.

Also consider pipelining if you can - when you have read something from memory, try to organize the following instructions so that the very next instruction does not depend on/need/use the value that you just read from memory. Make sure it's a non-memory-accessing operation using locals only. Then do another memory read after that - you can in some cases get the local calculations `for free` because you aren't stalling having to wait for the memory read to be complete. It's also ideal if you do some calculation after each memory read (if needed) rather than putting two memory reads next to each other.

ie an example possibly...
local sourcepixel:int=sourcepointer[0]
'do some math with something else
local destpixel:int=destpointer[0]
'maybe add 1 to the sourcepointer here
'now combine sourcepixel with destpixel
destpointer[0]=combined
'now add 1 to the dest pointer


If you can get rid of padding bytes in the destination pixmap, and if both pixmaps are the same size, you can just do one While loops for every pixel and not even have to do nested loops.

Also you can remove some of the loop overhead by unrolling the loop a bit - ie do, say, process 16 pixels as a procedural written-out program, and then loop - doing 1/16th as many loops. That diminishes loop overhead and adds to the speed.


_JIM(Posted 2009) [#3]
Wow, thanks a lot!

I was thinking of something close tot that, but I wasn't sure if it was possible in BlitzMax. I had to play a bit with converting from Byte Ptr to Int Ptr and change the pixmap format, but I made it!

Also, for some reason, DrawPixmap is slower than DrawImage(LoadImage(pixmap), x, y)

Once again, thanks!

I'll post the code tomorrow.


Bremer(Posted 2009) [#4]
You should make an image out of the pixmap and then draw with that instead of converting it every time you draw.

newimage:TImage = loadimage(pixmap) ' once outside you drawing loop

drawimage(newimage,x,y) ' each time within your drawing loop



Jesse(Posted 2009) [#5]
I don't know if the lockimgage pixmap is anyfaster I suspect it is. does anyone know?
do something like this:
Local image:TImage = CreateImage(x,y)

Local pixmapPtr:Int Ptr = Int Ptr(LockImage(image).pixels)

'edit pixmap pixels

UnlockImage()
DrawImage image,x,y



ImaginaryHuman(Posted 2009) [#6]
How much faster is your code now than it was before changing it?

DrawPixmap is slow - it can be 10 or more times slower than DrawImage. The time taken to load an image off disk, especially having loaded it before to put it into disk cache memory, and turning it into an image, is probably quicker than drawing moving it to the backbuffer due to all the packing/unpacking of pixels and stuff? That's my guess.


JoshK(Posted 2009) [#7]
You could do it instantly using a shader on the GPU.


ImaginaryHuman(Posted 2009) [#8]
Not instantly, but quickly, yes. But creating shaders is not everyone's ball of wax and a somewhat advanced topic. If he's having trouble with pixmaps I don't think shaders are going to be his cup-of-tea solution. But I can see your point, shaders are cool for everything.


_JIM(Posted 2009) [#9]
Hi! I was really busy today, so I didn't have time to do everything.

A few numbers:

Before the pointer trick: 8-9ms on my CPU
After the pointer trick: 2-3ms on my CPU

I had a problem with grabbing pixmaps. Initially, the effect was applied on an image. Everything fine until I added the GrabPixmap function which pushed it to 22ms. Switching to GLMax2DDriver got it back down to 5-6ms, and I am happy with it.

There is a problem though: it jumps to 23ms once every second just for 1 frame.

Edit: Down to 1-2ms. The slowdown was from external software (ATi Tray Tools, who was updating the framerate overlay once every second, and was also slowing everything down).


As for shaders, I have worked with them, and I am quite familiar with CG (I am also working on a 3D FPS aiming high-end hardware) but this time, I need to work with the most compatible methods as the game has a casual target (with Intel GMA cards everywhere). And shaders either mess the game's compatibility or, if it actually runs, it might run quite slow on that cursed wooden graphics card.

Once again, thank you everyone for the help.


ImaginaryHuman(Posted 2009) [#10]
Cool man.


_JIM(Posted 2009) [#11]
Well, I've done a little demo. Here's the media and source:

http://www.mediafire.com/?d3yatj2zjww

There probably are some more fine optimizations, but I'm happy with it :)


Jur(Posted 2009) [#12]
That is a nice effect, albeit a little slow on my X2 2GHz (20-23ms).


_JIM(Posted 2009) [#13]

That is a nice effect, albeit a little slow on my X2 2GHz (20-23ms).



Well, with the GrabPixmap in place, I think it's both depandant on CPU and GPU now.

One more thing I could do is to move every other expensive operation to another thread, and leave rendering and effects on one thread.