Optimization Help (Pixmaps etc)...

BlitzMax Forums/BlitzMax Programming/Optimization Help (Pixmaps etc)...

mic_pringle(Posted 2009) [#1]
Hi,

I've been developing a Raycasting engine in the style of Wolfenstein3d and Rise of the Triad, and everything was sweet until I introduced textures. Now, any resolution above 320 x 200 and the framerate drops considerably, to the point where it's almost unplayable.

The way I'm managing the textures is as follows ...

- Create a texture pixmap by loading an image from disk straight into a pixmap
- Create an empty buffer pixmap, the size and width of the resolution
- In the Raycasting loop I detemine which pixel I need from the texture depending on where the ray hit the wall and then use ReadPixel to get it, and then use WritePixel to write it to the buffer pixmap.
- Once the Raycasting loop has ended I draw the buffer pixmap to the screen, and then clear it's pixels

I'm assuming that using pixmaps is my bottleneck as from what I've read on the forums they can be slow. Which is why I'm here, does anyone have any ideas on how I can optimize this process ?

I've had the following thoughts so far ...

- Cache the pixmap pixel data into an array before entering the Raycasting loop and then using this instead of ReadPixel when needing to determine pixel color at a particual offset. Would this be faster than using ReadPixel ?

- Instead of writing to another pixmap, write the pixel directly to the screen. I understand that plot can be used for this, but can be slow because it calls glBegin and glEnd for each call. What I thought was to instead implement my own version, calling glBegin, then looping through an array to get the x and y values, calling setColor and then plot the point using glVertex2f. Finally calling glEnd once the loop has finished. This would then send all the commands to the GFX card in one, rather than individually ? Is my train of thought correct on this ? Would this be a worthwhile exercise ?

Does anyone else have any thoughts on how best to optimize this process ? I understand I can get a pointer to each pixel in a pixmap instead of using ReadPixel, which is apparently faster, but I'm not sure how to do this ?

All help appreciated.

Thanks

-Mic


matibee(Posted 2009) [#2]
I'm not sure about the internals of a pixmap but it sounds to me you only want your textures in ram and access them with a raw lookup.

Also have a destination (render target) in ram and write pixels with a raw lookup. Copy the entire render target ram to video memory in one operation at the end of generating the frame.

Work in 256 colors too :), or at most, 16 bit.


mic_pringle(Posted 2009) [#3]
Thanks 'matibee' ... any ideas on this technique can be implemented ?

Thanks

-Mic


matibee(Posted 2009) [#4]
Sorry Mic, I can't help with the bmax specifics, I'm still finding my feet here.

I enjoyed reading this article recently, it should give you plenty of pointers while you work out the bmax way to do it.

http://www.tophatarcade.com/dev/articles/2DSoftwareRenderer.php

Cheers
Matt


mic_pringle(Posted 2009) [#5]
Thanks 'matibee' ... have read through the article, put I don't think it applies well to what I'm trying to achieve.

Thanks anyway.

-Mic


Arowx(Posted 2009) [#6]
My understanding is that pixmaps are very slow, and yes the best way to access them is using a pointer.

So you could replace your readPixel and WritePixel with pointer based access.

Example here

Definitly look into writing your own DX or OpenGL routine for this as you could potentially save a lot of time, but have a search first someone has probably already written a faster bitmap plotter.

You are processing 64000 raycasts at 320 x 200 resolution, possible optimisations could be...

1. Don't recalculate when nothing moves
2. Only recalculate moved items e.g. monsters
3. Look into Axis Aligned Bounding Box 'AABB' optimisations used in raytracing.
4. Figure out a way for the GPU to do the work for you, so all calculations and pixel lookups occur on the graphics card?

But I am wondering why not just go the 3d route, even basic hardware could run way above this resolution and provide that blocky wolfenstein 3d look as well?


mic_pringle(Posted 2009) [#7]
Hi Merx,

Thanks for your post.

You are processing 64000 raycasts at 320 x 200 resolution


This is not quite accurate. I think you are thinking about ray tracing, where you cast a ray for each pixel. In ray casting you only cast a ray once for each pixel along the x-axis (or screen width), in this case 320 times. Which is why it was incredibly fast pre-textures, and why I associate the slow down with the use of pixmaps and ReadPixel/WritePixel.

Figure out a way for the GPU to do the work for you, so all calculations and pixel lookups occur on the graphics card?


I'm extremely interested in this suggestion. Do you have a links where I can get more information ?

But I am wondering why not just go the 3d route, even basic hardware could run way above this resolution and provide that blocky wolfenstein 3d look as well?


It started out as a bit of fun, but I have since become obsessed with it.

Plus I hate 3D maths.

Thanks

-Mic


ImaginaryHuman(Posted 2009) [#8]
Don't use readpixel or write pixel, learn how to use int pointers to get and set pixels, and keep the base pointers of the pixmaps in local variables and use as many locals as you can, defined outside of the loop to avoid garbage collection.

Not sure why you are trying to do raycasting when the hardware can do Z-buffered texturing for you much faster, unless you're going for nostalgia?


slenkar(Posted 2009) [#9]
you dont need 3d math to user minib3d


Arowx(Posted 2009) [#10]
@mic_pringle If you only cast 320 tests then work out which textures to display, why not keep their distance or z value and then scale the images accordingly and draw them in z order, why do you need to do all the texture/pixel copying?

@mic_pringle Well it's kind of back to front as the 3d your average GPU is used for provides nice z-buffered hi-res graphics, where it takes in textures and geometry and generates a pixel buffer.

So from this you know that it's what a GPU is made to do, but to get it to raycast in the style you need would probably involve writing in a GPU specific language depending on hardware ATI/Nvidia or OpenML...

Just google for 'GPU Raycasting' it appears to be used quite a lot in medical visualisation.

Regards

Merx


matibee(Posted 2009) [#11]
Sorry Mic, that article is not as relevant as I remembered it.

You can see the TPixMap source code (in the folder "mod\brl.mod\pixmap.mod"). In there you'll see functions like read and write pixel have a lot of conditional code to handle many different pixel formats. If you're working with one format throughout there's no conversion or checking to do and you can do it with raw pointer access quite easy.

This example has a pixmap for the back buffer, and another for a simple sprite, it accesses both by byte pointer.

I'm not sure what bounds checking comes with the Ptr types though so be careful (I managed a BSOD yesterday :) )





mic_pringle(Posted 2009) [#12]
@matibee

Thanks for this, I will have a proper look at it when I get home from work.

Just out of curiosity, did you implement the same program using the standard ReadPixel/WritePixel as well ?

I was wondering if you'd done any comparisons between the two methods to measure the speed difference if any ?

Thanks

-Mic


matibee(Posted 2009) [#13]
Haha Mic, that comes under the often used phrase; 'such exercises are left for the reader' :)

Just look at the pixmap source and you'll see how lengthy the Read/Write pixel functions have to be. It's always going to be faster to work to a bunch of knowns (pixel format, texture sizes, etc) than write code for every case.


mic_pringle(Posted 2009) [#14]
Just found this example of using vertex arrays ... may be worth looking at this as well ?

http://www.blitzbasic.com/Community/posts.php?topic=66184

Thanks

-Mic


matibee(Posted 2009) [#15]
Vertex arrays won't help with pixel plotting as they're used for drawing triangles (meshes, quads in 2d etc). The idea being to store geometry on the video card.


mic_pringle(Posted 2009) [#16]
I actually have to disagree as you can use GL_POINTS as the type, which is exactly what 'Plot' uses to draw single pixels to the screen.


matibee(Posted 2009) [#17]
You're way off target with GL_POINTS.

You don't want to draw single pixels to the screen. You want to draw to a buffer in ram then upload it in one chunk. The frame rate wouldn't be hit too hard simply making a new texture from your ram buffer and drawing it with the 3d api - it only has to be done once per frame.

Even if you had a vertex buffer of GL_POINTS on the video card, with one GL_POINT for each pixel, there's no way you could get the color data to them every frame in a way that's faster than the dynamic texture.

If you want the ultimate in performance, do it all on the video card:
http://www.gamedev.net/community/forums/topic.asp?topic_id=346775&whichpage=1&#2291866


mic_pringle(Posted 2009) [#18]
@Matibee

I'm having no luck using the methods you've described, so I was wondering if you'd be prepared to have a look at my source ?

You can get it here

I've included the version I have with no optimizations so you can see how slow it is.

Thanks

-Mic


matibee(Posted 2009) [#19]
Nice one Mic. I'll have a look tomorrow.


mic_pringle(Posted 2009) [#20]
@Matibee

Fantastic, I appreciate it. Let me know how you get on.

Thanks

-Mic


matibee(Posted 2009) [#21]
Hmm. That was an interesting learning exercise.

Here's a list of major changes (some may be more questionable than others and most are easy to comment out to see the difference);

[] using 24 bit rgb pixel format throughout
[] using raw pixel access instead of read/write pixel
[] drawing TWO identical vertical lines for every ray cast
[] the timing code now checks how long the frame took to render and draw, ignoring the flip time
[] it only raycasts (rebuilds the pixmap) on movement. this shows the time it takes to draw the pixmap (on my system about 7ms)
[] changed the movement code so it's time bound. slow frame rates meant it felt like you were running through mud so it made a big psychological difference when it didn't need to

For testing, I simply started the program and held the right arrow key down for a couple of full turns and noting the highest ms reading.

Results:
RGB pixel format:-
Read/Write pixel: 20ms
Memcopy: 17ms
neither: 11ms (commented out the for/next vertical line loop)

Memcopy takes 6ms and Read/WritePixel takes 9ms. Remember, drawing the pixmap takes around 7ms on my system so the scan code and logic took the other 4ms.

RGBA pixel format:-
Read/Write pixel: 24ms
Memcopy: 18ms
neither: 13ms

There's obviously some conversion of pixel formats going on here. The memcopy time has gone down to 5ms probably due to only ever accessing memory on 32 bit boundaries, while every thing else takes longer.

These tests were done at 640 x 480, windowed, with the double line drawing.

So where do you go from here?
Continue with raycasting?
There's some micro optimisations to be had such as forcing textures to be 128 width/height and replacing all '* texturewidth' code with shl 7. (A profiler would be the next stop though, I wonder if something like AMD's code analyst is still around and would work on a bmax debug exe.)
Move to a 8 bit palettized texture format (retrieving a pixel color from a texture MIGHT be quicker, but placing it won't be affected unless you go for a 16 bit screen mode).
Stop drawing the pixmap and make a dynamic texture, but drawing the pixmap is already pretty quick so it wouldn't surprise me if this is already going on under the hood.

Move to 3d?
Even basic 3d abilities would beat this raycaster hands down, especially has the resolution increases. Even moving this simple level onto the 3d card will make it lightning fast. You can store the geometry and textures in video ram and all you're doing is updating the camera position. There's the added benefit of bilinear/trilinear filtering of the textures too so you don't see those horrible big pixels when you're up close to a wall. It just depends on your needs and expectations.

Source is here.

Cheers
Matt


mic_pringle(Posted 2009) [#22]
@Matibee

Thanks for this, it is greatly appreciated. I now intend to go through the changes you made with a fine tooth-comb :-)

Continue with raycasting?
There's some micro optimisations to be had such as forcing textures to be 128 width/height and replacing all '* texturewidth' code with shl 7. (A profiler would be the next stop though, I wonder if something like AMD's code analyst is still around and would work on a bmax debug exe.)
Move to a 8 bit palettized texture format (retrieving a pixel color from a texture MIGHT be quicker, but placing it won't be affected unless you go for a 16 bit screen mode).
Stop drawing the pixmap and make a dynamic texture, but drawing the pixmap is already pretty quick so it wouldn't surprise me if this is already going on under the hood.

The whole raycasting thing came about as a bit of a learning exercise after re-discovering Rise of the Triad, Doom & Duke Nukem through DosBox. I figured if those guys could make such an engine that ran so sweet on a 386's & 486's, I wondered if I could do something similar using modern technology. Since then it's become a bit of an obsession, so I'm not sure where I'm going to go with it ... I do however think there are some more optimizations to be had, such as tidying up some of the ray calculations, using Cos and Sin instead of Sqrt and caching them as there's only a finite number of possibilities, reusing variables instead recreating them on each pass etc.

Move to 3d?
Even basic 3d abilities would beat this raycaster hands down, especially has the resolution increases. Even moving this simple level onto the 3d card will make it lightning fast. You can store the geometry and textures in video ram and all you're doing is updating the camera position. There's the added benefit of bilinear/trilinear filtering of the textures too so you don't see those horrible big pixels when you're up close to a wall. It just depends on your needs and expectations.

Perhaps that's the next move after this, I'm not sure. I don't really know much about 3D so wouldn't know where to get started. Especially using a language designed mainly for 2D games. I know MiniB3D exists, but if went that way it'd want to start from scratch so I'd know exactly what was happening.

Thanks again for this, it's greatly appreciated.

-Mic


mic_pringle(Posted 2009) [#23]
@Matibee

There's some micro optimisations to be had such as forcing textures to be 128 width/height and replacing all '* texturewidth' code with shl 7. (A profiler would be the next stop though, I wonder if something like AMD's code analyst is still around and would work on a bmax debug exe.)
Move to a 8 bit palettized texture format (retrieving a pixel color from a texture MIGHT be quicker, but placing it won't be affected unless you go for a 16 bit screen mode).


Would you mind at all expanding on these comments ? I know a bit about bit shifting, but not a lot about palettes ?

Thanks

-Mic


matibee(Posted 2009) [#24]
It was a bit different in the days of Doom. Small resolutions meant there were less pixels to shift and programmers had direct access to video memory - so where we're writing pixels to an intermediate buffer they could stick them straight into the display. They still made some very impressive optimisations though to make them work but not all of them are relevant today.

Implementing a bsp tree system would be crucial here too and to be honest is probably more difficult than moving onto 3d and leaving ray casting behind :)

(It can be really easy to manage objects in 3d space for a 2d game, where X=posX, Y=0, Z=posY just kinda leaves them 'on the ground')

Palettes: At the moment (for every pixel) we are getting 3 bytes from the texture and writing 3bytes to the video buffer. Suppose we had a pallete of 256 predefined 16 bit colors (2bytes each), textures were a single byte per pixel as indices into that array, and the final video format was 16 bit.

We could now read 1 byte per pixel (instead of 3) and write 2 bytes per pixel instead of 3. But we've added the overhead of a lookup. I guess we could drop the look up and have 16 bit textures too - but the paletting thing was used heavily on old hardware and programmers would mess with the palette to create graphical effects - so if you're going old skool it's got to be palettised :)

All of this is moot tho, as I've noticed bmax doesn't seem to have any 16 bit pixel formats defined so I doub't we could get it play nice in 16 bit modes. I'll have another look.


mic_pringle(Posted 2009) [#25]
Okay, so I've been looking for some of the alternatives to ray-casting as mentioned here, and I've found ...THIS...

Is this the kind of thing you were meaning when you said go to true 3D ?

-Mic


matibee(Posted 2009) [#26]
Yes, that's the easiest transition from where you are and what your trying to achieve.