Speed improvement : MASKBLEND or ALPHABLEND ?

BlitzMax Forums/BlitzMax Programming/Speed improvement : MASKBLEND or ALPHABLEND ?

Armitage 1982(Posted 2009) [#1]
Hi

I'm using a single large tileMap for my OPENGL platform game.
There is 3 layers of tiles rendering multiple time in every players "camera".

I often switch from ALPHABLEND, LIGHTBLEND and MASKBLEND during cycle.

I always think that MASKBLEND was quicker than ALPHABLEND specially if my TileMap had no transparent pixels. Even if I have no clues on this...

I could continue this way but find that some of my Tiles could be enhanced if a bit of transparency is apply.

Theoretically what would improve performances :

1) Choose ALPHABLEND, add some transparencies in my graphics and only change the blending mode for LIGHBLEND when needed.

2) Stay in MASKEDBLEND for tileMap and forget transparency in my tilemap.

3) Stay in MASKEDBLEND for tileMap and try to reduce the use of SetBlend.

4) ... ?

Note that I try every solutions and didn't notice big changes but I would like some infos on this. If there's performance changes between each blending mode, if often calling SetBlend is a bad idea.

Thanks :)


ImaginaryHuman(Posted 2009) [#2]
Hi Armitage1982. There are a few things which occur when you use blending.

In general, anything other than SOLIDBLEND will use the blending hardware of the GPU. ie if in some way you want to combine pixels from a source image (a sprite, an image, etc) with the existing pixels in the backbuffer it will require additional processing, especially if you want to in some way merge your tile/sprite with existing background pixels.

There are two stages to this proces. The first stage is a `test` which is performed on each visible source image pixel. So if your whole sprite is going to be visible on-screen then every pixel in the sprite image will be tested individually. If some part of the sprite is off-screen, the off-screen pixels will be found to be outside of the screen and will be discarded.

OpenGL uses what it calls an `alpha test function` to determine if pixels in the sprite will continue on to the next stage. When you call the commands SetBlend MASKBLEND or SetBlend ALPHABLEND, it sets up the alpha test function. Note that when you use SOLIDBLEND this test is not performed and neither is the next stage - all pixels are drawn as-is, so solidblend should be fastest.

For MASKBLEND, the alpha test function basically compares the alpha value of each sprite pixel with a given set value. You define this with SetAlpha if you want it to be anything other than the default of 0.5. The value ranges from 0 to 1.0, rather than 0 to 255. The alpha values in your pixel are converted from a 0.255 range to a 0..1 range for testing purposes on-the-fly, so 255 means 1.0 and 0 means 0. 128 mid-alpha would be 0.5. MASKBLEND sets the alpha test value at 0.5 if I remember correctly. Then when the alpha test is performed, if the alpha value of a pixel is >= 0.5 the pixel `passes` the test and can move on to the next step, otherwise it is discarded. Now, the more pixels that can be discarded in the test, the less work the next stage is going to have to do because there are less pixels to process.

For ALPHABLEND, a similar test occurs except the alpha value is usually set to 0. In other words, if a pixel's alpha value is 0 the pixel will not be considered in the next stage and gets discarded. Again discarded pixels save processing time in the next stage. Note, however, that even alpha values of 1 (in a range of 0..255) will pass the test and make it to the next stage whereas maskblend would take values of 128 or more.

Depending on which pixels pass the alpha test, which depends on whether you're using ALPHABLEND (alpha>0), MASKBLEND (alpha>128) or SOLIDBLEND (alpha isn't tested at all), the next stage now may perform a blending operation between the source sprite pixel and the existing background pixel.

Note here that SOLIDBLEND disables the blending hardware and does not attempt to consider the existing background pixel. It just writes all pixels to the background as-is and ignores any alpha channel. That's why SOLIDBLEND is the fastest.

In MASKBLEND mode, the 2nd stage of blending is disabled - basically once a pixel makes it past the alpha *test* to decide if it will be included, it is now written straight out to the background, overwriting whatever is there. It's similar to SOLIDBLEND except some pixels are discarded initially based on the alpha channel. In SOLIDBLEND the alpha channel is just a `mask` which basically says whether a pixel is `on` or `off`. Values >=128 means `include`, <128 means `exclude`.

Because MASKBLEND has to perform the alpha test whereas SOLIDBLEND does not, MASKBLEND initially will be slower than SOLIDBLEND. However, it depends then on how much overhead is needed to do the alpha testing, and how many pixels will be discarded and how many pixels end up being drawn. The combination of that gives you the `overall speed` of the rendering and that *could* be faster than SOLIDBLEND for sprites containing fewer passed pixels. But usually because each pixel has to be tested the testing overhead is more than just drawing in solid. It's hard to guage it.

ALPHABLEND is a different animal in the 2nd stage. After the pixels are tested for alpha>0 and pass onto the 2nd stage, the source sprite pixel's RGB values now have to be combined with the existing destination background pixel's RGB values. Each color component is handled separately. In ALPHABLEND mode, the blending hardware is enabled and every pixel now has to have math performed on it in order to basically cross-fade between the sprite pixel and the background pixel based on the sprite's alpha value.

The sprite's alpha channel value now represents, on a scale of 0 to 1.0, how much the source sprite's color components will be multiplied by in order to scale them. So say your sprite pixel is 255,100,12 and your alpha component is 0.8 (let's say it's 204), you will multiple 0.8 by each component and get 204,80,10 (the 10 is actually 9.6 but it rounds to an integer).

Next, the blending hardware calculates the inverse of the sprites alpha value. If the alpha was 204, it will be 255-204 = 51. 51 when converted to a range of 0..1 is about 0.19. This 0.19 is then used to multiply each of the existing background pixel's values individually, to scale them. Once they are scaled, the scaled background pixel values are now added to the scaled sprite pixel values - note that basically this will never produce values greater than 255 because the bigger value the source sprite pixels are multiplied by causes the destination pixel to be multiplied by a smaller value. The source pixel's alpha + the inverse alpha = 1.0 (255).

The result of that calculation - (SourceAlpha*SourceComponent)+((1.0-SourceAlpha)*DestComponent)) is now written out to the backbuffer. This is done for every pixel that passed the alpha test. So you can see that ALPHABLEND is likely to be slower than SOLIDBLEND or MASKBLEND.

In the case of LIGHTBLEND and SHADEBLEND they operate similar to ALPHABLEND. Pixels are included unless the sprite alpha is 0. Then the sprite pixel is involved in a mathematical formula which includes the background pixel, producing a result. Lightblend is basically a way of adding the source pixel to the destination pixel - so this may be a little faster than ALPHABLEND since there is no multiplication involved. If you have a black background and you want to draw in ALPHABLEND onto it, use LIGHTBLEND, it will have the same effect but may be faster. SHADEBLEND will be a bit slower probably because it's basically a multiplication of source pixel * destination pixel - but without the add. If you are wanting to do alphablending onto a white background, use SHADEBLEND, it will have the same result but might be faster. Both LIGHTBLEND and SHADEBLEND should technically be a little faster than ALPHABLEND. I think overall ALPHABLEND has the most work to do.

Please note also that the speed of each operation may depend on how it is implemented in hardware on any given graphics card, some cards might be optimized for faster alpha-blending, for example, while others might be the same speed to do all blending regardless of the math.

So in order of speed from fastest to slowest, my estimation is:

1) SOLIDBLEND
2) MASKBLEND
3) LIGHTBLEND
4) SHADEBLEND
5) ALPHABLEND

You also mentioned that you call SetBlend often. If you can `batch` together the objects that use the same blend mode and only call SetBlend as few times as possible that will help a bit - it saves function calls, but I don't think it's going to be a huge impact. I think you might get more of an impact when you change a) whether or not there should be an alpha test, and b) whether or not the blending hardware needs to transition between source and destination pixels. So when you go from SOLIDBLEND to any other mode or back there may be a small performance hit. And when you go from SOLIDBLEND or MASKBLEND to any of the other blend modes there may be a small performance hit. I'm not sure if Mark has it optimized but it could be that changing ANY mode has the same performance hit.

If your tilemap has no transparent pixels, use SOLIDBLEND. You only need MASKBLEND if you have a tile with a variegated edge to it where you want to show some background through. I would be very surprised if ALPHABLEND does anything other than make things slower.

Since you have `overdraw`, where you are drawing background layers and then drawing other stuff on top of them, you might speed things up also if you can remove some of the overdraw. e.g. if you have some mountains in the background which scroll slowly but you can only see at most 50% of the mountain image, don't draw the parts you will not see. There are various ways you could optimize your drawing by reducing how much needs to be drawn to get the effect you want.


Armitage 1982(Posted 2009) [#3]
Thanks for this extensive and good explanation.

Generally I try to stay with 32x32 tiles so it's easier.
Every objects in my game are batch rendered and use a simple but effective culling rendering system.

I probably can reduce my setBlend Calls, try to use SolidBlend when possible (on rare opportunities) and test the performance hit when using AlphaBlend rather than Maskblend.


ImaginaryHuman(Posted 2009) [#4]
Do you have multiple tiles on the same image, or separate images for each tile?


JoshK(Posted 2009) [#5]
Masking is no-cost.


Armitage 1982(Posted 2009) [#6]
For each of the 3 Layers I have multiple tiles on the same image (even 2 of the layer share the same texture in batch).
And since my game come with a full skin selector each objects obviously comes with is sprite.
I think this "sector" is optimized enough.


ImaginaryHuman(Posted 2009) [#7]
It sounds like you're doing things fairly efficiently already. What kind of framerate do you get on what graphics card?

In the old days of optimizing tile engines, there were a few things that would help. For example, sometimes to scroll smoothly a game would have to use bigger tiles - the overhead from smaller tiles was too great. You might consider larger tiles of 64x64 or even 256x256 for large background chunks where possible. It also reduces the amount of geometry/quad data.

Also in the old days we did not do a full screen refresh of all tiles every frame. We had a bit more control over the backbuffer and could do things like dirty rectangles or drawing only the newest strip of tiles which were coming into view, preserving the contents of the backbuffer and using it again in the next frame. That's not so easy to implement in a modern graphics system and you'd have to get down to using custom OpenGL code to handle renderbuffers.

Another approach is to use the stencil buffer to cut down the overdraw - draw your foreground first and add its pixels to the stencil buffer, except for transparency pixels, and then when you draw the next layer you only draw if the stencil pixel is not set, whilst adding new pixels to the stencil. Continue that way, it will only output pixels that are `visible`. But then you have to get into GL code to access the stencil buffer and I'm not sure that it would be worth it in terms of speed gain.