Slow images

BlitzMax Forums/BlitzMax Programming/Slow images

Eikon(Posted 2005) [#1]
I am writing a 2D side scroller in Max and have been constantly running into speed issues through every step of its development. All of them seem to be linked to BMax's poor performance with large images. Here's a breakdown of my games frames per second as different portions of the rendering code are added in.

640x480x16

Blank Screen: 6,200 FPS

Render Map: 2 layers of 16 x 16 1.5 Scaled + Filtered Tiles

Down to 2,600 FPS O_o

Render 320x240 2x Scaled + Filtered Background Image: 1,500FPS

So here with only a tilemap and one background image showing, frames per second are already at 1,500 on my PC, and are around 200 on my less powerful laptop. How am I sopposed to add in anything else when something this basic brings a Radeon 7500 / P4 2.0Ghz laptop to its knees? I have the latest drivers on all my systems, and after much testing I am very disappointed with OpenGL performance in general.

I was planning on having two parallax scrolling backgrounds but this doesn't seem possible. Does anyone have any speed tips they can share?


TomToad(Posted 2005) [#2]
Maybe you could try using DirectX instead of OpenGl. I've been working on a scrolling game and decided to do some tests. I found that with the OpenGl drivers, I was getting about 500 fps when the most was being drawn on the screen, and about 1000 with the least.
Using DirectX drivers, I got about 650fps with the most drawing and about 1200 with the least.
CPU and video card are
AMD XP 2400+ 2ghz: nVidia FX5200:

BTW, rendering a blank screen on my system only achieved appx 2000 fps. You sure you were calculating the fps correctly? Or were you really rendering a blank screen and not just flipping the backbuffer and counting how many times you could do that per second?


Kuron(Posted 2005) [#3]
You are not alone, I am having similar issues. The problem is BlitzMax does not support 2D, its 2D is "emulated" via 3D methods (whether using OpenGL or D3D7 drivers). Hence, this is really hard on a graphics card, and when you have a lot of stuff going on at once, it will completely choke.

An example, even on very good speced system here, the IGlass demos absolutely crawl because of the effects being used in them.

Things like scaling, alpha blending, shadows are absolute FPS killers. I get far worse performance with the DX7 drivers than with the OpenGL drivers.

I LOVE the core language of BMax, unfortunately using it for anything related to graphics, 2D, sprites, games has been disappointing due to the performance issues, not to mention the sound issues.

Hopefully the next update will fix some of the issues.


Ferminho(Posted 2005) [#4]
I disagree in using 3D in the background being something negative... I think pure 2D operations like alpha, scaling, rotating are far slower and having 3D acceleration for those is great.

I've seen some blitz3D/basic apps and realtime rotation/scaling was soo slow...


MadMax(Posted 2005) [#5]
Yes I've found this too, Bmax is nice and its OOP stuff is interesting to use, but once you start wanting to do something serious it starts crawling. Sound is a nightmare.

This OOP is very nice from a programmers point of view, but maybe good old "spaguetti" code is faster. Let's hope future updates sort out these issues as Bmax has a lot of potencial.


TartanTangerine (was Indiepath)(Posted 2005) [#6]
BMAX is 2d in 3d, each image is made up from a textured quad. If you have a large image then you are going to suffer from the same old fill-rate problem you would get with any game written in any language.

Things like blendmodes and rotation and scaling do not kill FPS, large images, large amounts of different images will kill FPS. You still have to play by the same rules you did with Blitz3d to keep FPS rates up. Keep the amount of surfaces to a minimum, BMAX will create a new surface for each image you load, try to pack many images into a single image as this will require only one surface to draw. Try to keep image(texture) sizes ^2, BMAX will resize if the image does not conform - hence major amounts of extra vidmem is used.

Conclusion: Treat the BMAX graphics engine like a Blitz3d single surface system.


Kuron(Posted 2005) [#7]
Things like blendmodes and rotation and scaling do not kill FPS,
Cute theory, but not true. In the example codes simply disabling shadows and alphablending give a huge FPS boost, but this is par for the course and will be the same with any 3d game.


TartanTangerine (was Indiepath)(Posted 2005) [#8]
Rotation and Scaling will not have an impact since BMAX calculates each vertex position everytime you draw an image. Having said that, scaling an image so it fills a large proportion of the screen will impact performance but only because of fill-rate issues.

Blendmodes are handled by the 3d Hardware not software, disabling these could give an FPS boost on older 3D GFX hardware.

So this all points to a GFX Hardware problem and not an issue with BMAX. I suppose you could re-write the drawing function to ignore scaling and rotation, not sure if you'd get much of a boost though.


Kuron(Posted 2005) [#9]
Blendmodes are handled by the 3d Hardware not software, disabling these could give an FPS boost on older 3D GFX hardware.
It will give a boost on newer cards, too. 3D effects come with a price and that price has always been and will always be performance :c/ BMax is currently using dated version of OpenGL and D3D and cannot even take "full" advantage of newer graphics cards. Granted in Eikon's case he is using a Radeon 7500 which came out back in 2001 and was a budget card. However, with the dated versions of OpenGL & D3D being used by BMax (which are equally as old, if not older than his video card), he should not be having any major performance hits.

OGL under B+ still blows away BMax in performance.


Dreamora(Posted 2005) [#10]
You all know that 2D means no alpha, don't you?
This is the reason why it was used as actual 2D games without alpha aren't really doable :-) (and languages that tried so are really crap in it, like PureBasics FX stuff or SDL).

The reason this whole stuff gets that slow is because BM resends the whole vertex data with every image drawing instead of just changing their transformation. This is the bottleneck as it just "overloads" your graphic cards.
Even newest hardware can't handle it that good and pre-shader hardware with its unoptimized pipe will break far earlier than shader hardware.

Hopefully we will see an extended Max2D when Max3D is arriving using VBOs and similar stuff instead of creating every quad again with "glbegin,glend" everytime ...


Paradox7(Posted 2005) [#11]
Will there every be a true 2D in blitzmax :( and not this 2d in 3d world we have


Kuron(Posted 2005) [#12]
You all know that 2D means no alpha, don't you?
Depends on who is doing it ;c) I have a DD 2D engine laying around that has very fast alpha blending. Generally to get decent alphablending it means brushing up on your ASM skills and doing it the old fashioned way. However alpha blending is slow as can be in IBPro when used on numerous sprites. Its ok in PB which used D3D for alphablending (but then thats not true 2D).

Will there every be a true 2D in blitzmax :( and not this 2d in 3d world we have
No, because MS yanked DirectDraw when DX8 was released. Even worse is DX10 will break backwards compatibility (as MS has been promising for over 5 years) and will run anything pre DX10 via software emulation which will be much slower.

However, DirectDraw has never been needed for 2D under Windows. I have seen many 2D engines done in C++ that are strictly GDI and are extremely fast. Much, much faster than the "software driver" in B+ and great performance even on systems with mobo video chips.


altitudems(Posted 2005) [#13]
I'm not sure why you guys are having trouble with speed. My engine runs perfect with over 2000 tiles, perpixel lighting, weather particles, etc. being drawn every frame.

You might want to check and see if your your issues arent solved by better memory/object managment.

All I worry about is making sure that someone with a 16-32mb video card and 1ghz processor get at least 75fps. If I can achieve this, I just use delta timing as a backup, and I'm set.


Eikon(Posted 2005) [#14]
Seems like I struck a vein here. Thanks everyone for your replies that let me know I am not alone.

The reason this whole stuff gets that slow is because BM resends the whole vertex data with every image drawing instead of just changing their transformation. This is the bottleneck as it just "overloads" your graphic cards.
Even newest hardware can't handle it that good and pre-shader hardware with its unoptimized pipe will break far earlier than shader hardware.


I have read about the problem you're talking about, Dreamora, and I seem to remember Simon saying he had a fix for that, but he never showed back up with the code. Here's hoping he can find the time to address this serious issue.

http://blitzbasic.com/Community/posts.php?topic=48587&hl=drawimagefast

@TomToad: I've tested with Direct3D and it seems slower in every case except on my laptop, where it performs about equal with OpenGL. In my experience D3D will only exceed OpenGL speed on outdated or integrated hardware. I'm also sure 6,200FPS is a correct reading.


TartanTangerine (was Indiepath)(Posted 2005) [#15]
http://developer.nvidia.com/object/Top_Things_Kill_DX7.html


Dreamora(Posted 2005) [#16]
This drawimagefast things and other brought up optimations (like batch rendering) are only of little use. The gain is next to 0 with most "actual" GPUs but the needed cpu power to calculate this optimations in realtime is more then nothing. So its more of a lose than a win.

The only real optimations would be a switch to vertexobjects etc which are stored in VRAM. But this would have a bad impact on compatibility especially to office graphics hardware with their SiS, Intel and other onboard chipsets that wouldn't work anymore. So I think we won't have a that good chance to see a Max2D enhancement in that direction ...


Gabriel(Posted 2005) [#17]
This drawimagefast things and other brought up optimations (like batch rendering) are only of little use. The gain is next to 0 with most "actual" GPUs but the needed cpu power to calculate this optimations in realtime is more then nothing. So its more of a lose than a win.


Firstly, you're wrong to group my drawimagefast suggestion together with batch rendering. There is zero overhead with my drawimagefast suggestion. It may or may not be a big gain, but it's all gain.

Secondly, your comments about batch renderinging are only true in theory. If the CPU was as out of date as the videocard, there might be some truth in what you say. Probably not much because CPU's are insanely fast these days, but maybe some. In practice, though, it's not like that. When you go into Fry's or BestBuy and buy a new PC, you can't buy a five year old CPU. The most underpowered CPU you can still buy is still insanely fast and more than capable of a little image sorting. The same is not true of videocards. While the chipsets are often newish, they're often about as capable as cards that are five years old. So even if there is a small CPU overhead in batch rendering, it's actually a good thing because it goes some way to redressing the imbalance between CPU's and GPU's in vaguely modern PC's.


ImaginaryHuman(Posted 2005) [#18]
If you switch off filtering you should get a framerate improvement. That does consume quite a bit extra effort usually.

I think also you would be better off having a tilemap made of 64x64 tiles rather than 16x16, as that's a whole lot more overhead to process the individual tiles, switching textures, etc.

Also you can use some OpenGL buffer tricks, maybe use the stencil or alpha buffer, so that you don't have to draw the texels that have already been drawn, ie don't waste time drawing on pixels that are already done.

I think BlitzMax is plenty fast enough with OpenGL. Maybe on the PC OpenGL is slower sometimes. If you look at the Platypus demo that was released in the showcase forum recently, it has LOTS of parallax scrolling and screen action with no slowdown. It's entirely possible. Maybe you are not thinking in a way that is cooperative for the way a graphics card works. There are different rules.


Dreamora(Posted 2005) [#19]
Syb: yours doesn't exist. the link above runs into an error. Thats why I assumed it was like other "fast draw" implementations that check for simple rebindings, which hasn't much effect as BM does not rebind if not needed as well.

I know that CPUs are extremely fast, although I only have an "old" P-M 1,6ghz.
But on systems that take advantage of batch rendering in 2D etc the CPUs won't be that fast and I think, spending all their power an enjoyable gameplay instead of reorganisation of data is the better choice, if you had to choose. But I think you agree in that.
If there is no choice that has to be done, then optimations like that can help :-)


Gabriel(Posted 2005) [#20]
yours doesn't exist. the link above runs into an error. Thats why I assumed it was like other "fast draw" implementations that check for simple rebindings, which hasn't much effect as BM does not rebind if not needed as well.


It errors for me too, but since I suggested a command called DrawImageFast, I assumed it was mine. If it wasn't then, none of what I said applies. If it was, then my suggestion was specifically *not* to check anything. It was to give us a command which draws without doing the checking that BMax usually does, and forgoes the GLBegin..GLEnd pair. In other words, it lets us force fast drawing when we want to draw many copies of the same image ( particles? ) and does no checking. It's not error safe, but as I pointed out in my request, neither was writepixelfast in old versions of Blitz. You can't have everything, and since it's entirely optional, I think it's worth letting people do something silly if they can get a speed boost from it. You only need a warning in the docs.


Nelvin(Posted 2005) [#21]
Batching vertices using the same renderstates/textures would be by far the BIGGEST possible performance boost. A single call to GL or D3D does have HUGE overhead, has to walk through several abstraction layers, has to do lots of additional tests to be as robust as possible and as fast as the given level of execution allows.
So batching the vertices each frame and sending them together to the graphics card is minimal overhead compared with sending each single blit.
It of course depends a lot on trianglesize - 100 fullscreen triangles won't get much faster by sending them batched to the graphics card, but thousands of polys used by tilemaps, particle systems etc. will get a lot faster.

Onboard vertexbuffers aren't of much help for the typical 2D needs - these are mainly used for instancing and/or vertexshaders (skeletal animations etc.)

A very thin but portable batching layer would be the best and most useful extension for Max2D.


Dubious Drewski(Posted 2005) [#22]
Eikon, an initial framerate hit of 6200 to 2600 isn't something
to worry about. The size of the
performance hits will reduce as more of the game is added.
Eventually, when your game is completed, you'll still get
framerates of 100 - 200, and that's nothing to complain about.


Gabriel(Posted 2005) [#23]
Eventually, when your game is completed, you'll still get
framerates of 100 - 200, and that's nothing to complain about.


You obviously didn't read carefully. He's down to 200 on the laptop now, with nothing but a tilemap and a background. Frankly, I'm not surprised. Tilemaps are *EXACTLY* what I had in mind when I suggested DrawImageFast or even better an optimized renderer. And was sadly ignored.


Dubious Drewski(Posted 2005) [#24]
I see your point, Sybixsus. But, the first 3 lines in my post above are still true.


tonyg(Posted 2005) [#25]

Conclusion: Treat the BMAX graphics engine like a Blitz3d single surface system.


I agree but how do you do this?
Loadanimimage will create seperate images for each frame.
If you loadimage, how do you reference only a portion of that image? Drawimagerect works differently.
Drawimageblock and drawimagerect in the code archives creates a new image on the fly (I think) which might help but I haven't got either to work...
DrawImageRect
DrawImageBlock


Eikon(Posted 2005) [#26]
tonyg: If you loadimage, how do you reference only a portion of that image? Drawimagerect works differently.

I use this function here

but all my testing has shown that using an AnimImage is faster than a single image and any DrawImageRect method I could find.


TartanTangerine (was Indiepath)(Posted 2005) [#27]
I agree but how do you do this?
Loadanimimage will create seperate images for each frame.
If you loadimage, how do you reference only a portion of that image? Drawimagerect works differently.
Drawimageblock and drawimagerect in the code archives creates a new image on the fly (I think) which might help but I haven't got either to work...


Are you sure LoadAnimImage makes seperate images?

Draw parts of an image using the SetUV command that I posted on this forum (it's somewhere, not idea how to find it).


TartanTangerine (was Indiepath)(Posted 2005) [#28]
Are you sure LoadAnimImage makes seperate images?

OMG its true, there is a seperate PixMap for each frame.

Now I know what I'll be doing today.


TartanTangerine (was Indiepath)(Posted 2005) [#29]
I've written a new Type called TAnimImage, you can load your multiframe images into this very much like the LoadAnimImage BUT it only uses one texture and not several like the BRL command.

I got massive memory savings and speed increases : http://www.blitzbasic.com/Community/posts.php?topic=51647


Kuron(Posted 2005) [#30]
Thanks dude. Every little bit helps.


Dubious Drewski(Posted 2005) [#31]
Videogame Takeaway, that Type is great. Well done, mate.


ImaginaryHuman(Posted 2005) [#32]
You guys are spoilt. Back in the old days, it was hard enough to get several objects to move around on a screen where the background wasn't even updated every frame, at 320x256 or similar resolution. Here you're playing with 800x600 resolutions, 32-bit color, and expecting it to just fly along. It's a lot of work to operate with these higher resolutions and color qualities. So how about some gratitude that things are as fast as they are?

Here I only have a GeForce 4MX - 64mb ram, and supposedly about 1 billion texels per second. It seems plenty fast enough that the recently released Platypus demo runs perfectly smoothly, 60fps scrolling with lots of overlapping masked parallax layers, plenty of enemies and on-screen movement, particle effect etc. As far as I'm concerned that is good performance, at 640x480 in 32-bit color (unless it's 16bit?, but still). Actually seeing that demo run is inspirational and brings a smile to my face knowing that I can likely, therefore, implement the kind of ideas I had in mind.

If you are seriously getting major slowdown with your tilemap engine maybe you need to rethink how you are implementing your game and how you are drawing your screen. I hope you aren't trying to draw 1000's of 16x16 individual images, for example? When you're working with OpenGL and hardware accelerated graphics, you really have to think differently that you might've thought in the past with software renderers and so on. You have to change your thinking and your approach, not just throw some old program into a new paradigm and expect it to work well.