Drawing background is slowing my game down.

BlitzMax Forums/BlitzMax Beginners Area/Drawing background is slowing my game down.

Rico(Posted 2008) [#1]
Hi I have just added a full bitmap backround to my game. My game runs in 1024,768,32 mode. IT is really slowing my game down. The framerate drops from 54 FPS to 43 FPS. However when I put a time measure round the appropriate DrawImage statement like this

stt=MilliSecs()
DrawImage Back_Im,0,0
stt2=MilliSecs()-stt
DrawText "Time taken  "+stt2,100,200


The time taken is zero. I was wondering would displaying this image - slow down other parts of the code somehow?

My laptop isn't amzingly fast 1000 MGhz but it has a graphics card - ATI Mobility Radeaon. - and it should be able to display a static 2D screen very easily (I'd imagine)

Does anyone have any ideas of why this slows my program down?


Gabriel(Posted 2008) [#2]
Are you calling CLS in your main game loop? If you are, you don't need to. Also, make sure that blending is disabled for this background.


MGE(Posted 2008) [#3]
"1024,768,32" is a fairly high resolution for a full background redraw. Then you have sprites, score, hud, etc, on top of it. This will put a hit on lower spec systems. 43fps is not that bad in that high of a resolution. Even less noticeable if you're using a good fixed logic/render whenever timing structure.


Derron(Posted 2008) [#4]
Depending on your graphics mode (DirectX, OpenGL) the time-consuming part isn't directly measurable next to DrawImage-Commands.

Instead measure the time your "flip" needs. Imagine it as the "DrawImage" being only a "add drawimage to a list which is run through when flipping" thing.

But remember: it depends on the grapic-engine you use.


bye MB

edit: as some hints:

SetBlendMode(SolidBlend)
DrawImage BGimage
SetBlendMode(yourOldBlendMode)

Then you also should try to split the image into two parts:
1024x512 and 1024x256. (one big part at the top and the smaller one at the bottom) this will save you VRAM of 1024x256xbitdepth-of-your-image (say 0.25mb). You can surely "type"-ifier it by yourself using Lockimage, TPixmap etc.


xlsior(Posted 2008) [#5]
The time taken is zero. I was wondering would displaying this image


The time taken to send the draw command to the video card is near zero -- but the work (and slowdown) doesn't actually happen until you do the flip.


deps(Posted 2008) [#6]
Also, I'm not completely sure you can trust millisecs().


Derron(Posted 2008) [#7]
if you don't have an atomic clock built in your head or computer, you will have to trust the timers/clocks your mainboard spreads to the OS.

bye MB


Rico(Posted 2008) [#8]
Oh right - so thats why it came up as zero. I see now.

What I am doing in my game is using Box2D for the game physics and collisions. This system uses 2D polygons - so all shapes are drawn (and filled) using the inbuilt DrawPoly commands etc in BlitzBasic. This is very fast.
It is only when I use DrawImage to draw proper sprites (rather than just polygon outlines) it slows down. This seems strange because I would expect it (but I'm probably wrong) to be faster to draw a pre-made sprite rather than to actually draw lines and fill the space in. I have lots of objects in my game and there is no slow down until I actually add the sprites or the background image. (normally the background is drawn using the DrawPoly commands too.)
I can have about 120 objects on screen all with full physics before it starts to drop frames.

I like the idea of splitting the image into 2 halves. What do you mean by Type - ifier it Michael B?

I am going to but a new laptop very soon anyway - I am using a friends at the moment. If I got a fast one with a decent gfx card - would it easily be able to draw what I require?

BTW my laptop is 1Ghz (made a bit of typo there :))

Thank you very much for all the kind help.


deps(Posted 2008) [#9]
if you don't have an atomic clock built in your head or computer, you will have to trust the timers/clocks your mainboard spreads to the OS.

Yes, but it's a difference between high precision timers, and normal ones. The normal ones might only get updated ever 10 millisecs or so. I'm not sure millisecs() is a high precision one. There's code in the archive that shows you how to get access to the high precision ones on various platforms.

For example here (win only): http://blitzmax.com/codearcs/codearcs.php?code=2059

There is also code in the Retroremake framework that supports more platforms: http://code.google.com/p/retroremakes-framework/


Derron(Posted 2008) [#10]
What i thought about when typing "type-ifier" - if you want to use this type of displaying an image more than once (the background) you may workout a "type" like the "TBigImage" which can be found here in the forums/codearchives.

If you want to code it by yourself:
One class/type holding the sliced parts of a big image: eg. TBigImageParts
One class/type holding the whole image and a list of the parts: TBigImage

Make a Creation-function which params an image, then lock this image, copy the desired parts of the image's tpixmap creating new TBigImageParts.
Add those parts to the list and unlock the image.

Make a Draw-method for TBigImage which runs through the list of parts and draw them.

You can define by yourself whats the width/height of each part, so you can split e.g. an 800x600 backgroundimage into:
top: 512x512, 256x512, 32x512
bottom: 512x128, 256x128, 32x128

To make the sizes get generated automatically you just code a loop which may look like (its neither tested nor checked for correct syntax):
  local partWidth:int = 2
  local maxWidth:int = 512
  local minWidth:int = 32
  local spaceLeft:int  = Image.width
  repeat
    while (partWidth <= spaceLeft and partWidth <= maxWidth)
      partWidth:*2
    wend
    'creating the new part-Image
    spaceLeft:- partWidth
    ...
    print "maximum partwidth fitting into the image:"+partWidth
  until minWidth >= spaceLeft
  'creating the last part
  print "last part reached: creating part with rest of the image"


You can - I'm sure, shorten the loop to also include the last part ("frame") and to loop through the height of the image too - but my brain is kind of slow in the mornings.


Using this methods you save minimize the unneeded usage of VRAM. Drawing a 1024x768 image as a "block" will internally blow its dimensions up to 1024x1024, producing 256x768xdepth of not needed bits. Using 800x600 this is even blown to 1024x1024 making the methods mentioned above more useful than in your application.

---

To your problem with the fps: do your sprites use all the same timage or has each of them its own instance or random timage?
If they share the same base (they also may use "frames" of a image): it's much faster to draw them straight one after another than doing something like:
  for local obj:mytype = eachin self.list
    DrawImage(obj.img, obj.x, obj.y, obj.frame)
    DrawImage(Overlay, obj.x, obj.y)
  next


optimized:
  for local obj:mytype = eachin self.list
    DrawImage(obj.img, obj.x, obj.y, obj.frame)
  next
  for local obj:mytype = eachin self.list
    DrawImage(Overlay, obj.x, obj.y)
  next


Using the last method you will obtain an measurable increase of your fps because internally there would be no need to "switch" between images.
As you sometimes need to draw an "overlay" over a sprite which is then partly covered by another, you may include something like "layers" or "z"-values - corresponding to them you loop through the two loops above (loop until "z" changed and so on).

Short: avoid switching between images as often as you can. Place "small" images into one imagestrip (eg. all tiles for one theme of a level-set).

splitting it this way you may even speed up your app when you can be sure that one typ doesn't need alphablending, you may set the blend to solid before a whole list of sprites/images is drawn and then set it to alpha back again.


Ok enough, tired ;D

bye MB


Rico(Posted 2008) [#11]
Thanks MichaelB - Thats a lot of help there!:) - good tips on speeding up spriteimages - I do have a lot of similar bullets and crates which I can use that technique on. Thank you.

Yes I will try to split the big image up like you suggest into types. Hopefully that will speed it up :)

BTW Since I am getting a faster laptop (soon) with a much better gfx card (medium to high spec) will I need to use this technique then? or will a more modern gfx card with more memory be able to cope easily?

I appeciate the help :) Thank you very much


deps(Posted 2008) [#12]
It will be able to render it faster, I guess, but it's always a nice idea to optimize your code if you can.
But wait with it until the games is nearly done.
(Except for the excellent suggestions in MichaelBs post, you might want to do that from the beginning since it could be a bit hard to do it later)


Who was John Galt?(Posted 2008) [#13]
If your background image is larger than the screen (and it doesn't look as if it is), try just drawing the relevant image rect instead of the whole image.


Rico(Posted 2008) [#14]
I have split the 1024x768 background into 16 squares. Howver at the moment it is rendering it slower than before - fps drops to 20. Whereas the full screen background fps drops to 43 fps. Maybe my graphics card is just rubbish. - though it shouldn't be that rubbish :)

I know I shouldn't be complaining but its only a 2D game - maybe gfx cards these days are rubbish at 2D graphics (no blitters?) - though I thought BlitzMax might treat sprites like textures?


tonyg(Posted 2008) [#15]
I thought BlitzMax might treat sprites like textures?

It does.
You might want to post an example which shows the slowdown.


MGE(Posted 2008) [#16]
"I have split the 1024x768 background into 16 squares." There is no need to do that. Use a full 1024x768 image and just draw it. BMax will convert it to 1024x1024 so you'll lose some vram space, but that shouldn't be that great if an issue.

1024x768 is not a low resolution for 2d games. Consider moving to 800x600 if you want better rendering speed and you have alot of objects moving around. Otherwise, if it's a slower paced game you can get away with 20 or 30fps as long as you're using time based or fixed logic.


ImaginaryHuman(Posted 2008) [#17]
Switching to more images slows things down, you want as large a single image as possible.

What graphics card do you have? Check online for how much texturing texels-per-second it has. Anything below, let's say, 1 billion/second, I would consider low-end.


Rico(Posted 2008) [#18]
Well I can't really post an example because it is part of a huge program - though I might write a seperate program to just draw the background (both ways) - and test the speed with that.

My Graphics card is an ATI Radeon Mobility 16MB. It days DAC 350Mhz (came with my dell laptop). I don't really know anything else - this is just from doing dxdiag. It should have a model number but it doesn't display it.

Thanks guys


MGE(Posted 2008) [#19]
"16MB" - There's your problem. Move to a lower resolution and you should see a speed increase. 1024x768x32 (x2 for flip) is eating up most of the memory by itself. ;)


Derron(Posted 2008) [#20]
I didn't say: split the big image into as much smaller images as possible (because then you end up drawing eg 40 smaller parts with each consuming a "textureswitch"). When it comes to lower end GPUs (16mb) you are in duty to save VRAM - using the split-bigger-images-method.

Thats why the method above ends up using "maximum" possible dimension (2^n) to avoid fixed tiles like 256*256 - resulting in 12 sprites which have to be drawn in 1024x768.


Ok... paused this posting for some minutes and mocked up some code:

Graphics 1024,768

'vars
Global doCls:Int=0;flipType:Int=0;drawType:Int=2;doAlphaBlend:Int=0
Global fps:Int=0;frames:Int=0
Global second:Int = MilliSecs() + 1000
Global texts:String[5]
texts[0]="false";texts[1]="true"
texts[2]="one big image";texts[3]="4 optimized images";texts[4]="16 256x256 images"

'image creation - one big, 4 optimized, 16 static 256x256
Global TestImage:Timage = CreateImage(1024, 768)
Global SmallImages:Timage[16]
Global ImageTopA:Timage = CreateImage(512, 512); ImageTopB:Timage = CreateImage(512, 512)
Global ImageBottomA:Timage = CreateImage(512, 256); ImageBottomB:Timage = CreateImage(512, 256)
For Local i:Int = 0 To 15
 	SetColor Rand(100),Rand(100),Rand(100);DrawRect((i Mod 5)*256, Floor(i/5)*256, 256,256)
	SmallImages[i] = CreateImage(256, 256)
 	GrabImage(SmallImages[i], (i Mod 5)*256, Floor(i/5)*256)
Next
GrabImage(ImageTopA,0,0);GrabImage(ImageTopB,512,0)
GrabImage(ImageBottomA,0,512);GrabImage(ImageBottomB,512,512)
GrabImage(TestImage,0,0) ; Cls;SetColor 255,255,255 

While Not KeyHit(KEY_ESCAPE)
	frames:+1
	If KeyHit(Key_A) Then doAlphaBlend = 1 - doAlphaBlend;If doAlphaBlend=1 Then SetBlend(alphaBlend) Else SetBlend(solidBlend)
	If KeyHit(Key_C) Then doCls = 1 - doCls
	If KeyHit(Key_F) Then flipType:+1;If flipType>=2 Then flipType = -1
	If KeyHit(Key_D) Then drawType:+1;If drawType>=5 Then drawType = 2
	If doCls Then Cls
	If second < MilliSecs() Then second = MilliSecs()+1000; fps = frames; frames = 0
	If drawType=2 Then DrawImage(testImage,0,0)
	If drawType=3 Then DrawImage( ImageTopA,0,0);DrawImage( ImageTopB, 512,0);DrawImage( ImageBottomA, 0,512);DrawImage(ImageBottomB,512,512)
	If drawType=4 Then	For i = 0 To 15
		 	DrawImage(SmallImages[i], (i Mod 5)*256, Floor(i/5)*256)
		Next
	DrawText(fps, 50,20)
	DrawText("do (c)ls before draw: "+texts[doCls], 50,50)
	DrawText("used (d)rawtype: "+texts[drawType], 50,65)
	DrawText("used (f)liptype: "+flipType, 50,80)
	DrawText("use (a)lphablend: "+texts[doAlphaBlend], 50,95)
	Flip flipType
Wend


I dunno if I did something wrong, on my gpu (geForce 8600 GT) there is nearly no difference to measure - may be because i'm only drawing one image instead "20 small parts of one image" which would show much more differences between the methods above.

If you add
'short test
Local start:Int = MilliSecs()
For i = 0 To 1000
	DrawImage(testImage,0,0)
	Flip 0
Next
Print "Big Image:"+ ((MilliSecs()-start)/1000)+" ms per frame"

start:Int = MilliSecs()
For i = 0 To 1000
	DrawImage( ImageTopA,0,0);DrawImage( ImageTopB, 512,0);DrawImage( ImageBottomA, 0,512);DrawImage(ImageBottomB,512,512)
	Flip 0
Next
Print "4 images:"+ ((MilliSecs()-start)/1000)+" ms per frame"

start:Int = MilliSecs()
For Local j:Int = 0 To 1000
	For i = 0 To 15
	 	DrawImage(SmallImages[i], (i Mod 5)*256, Floor(i/5)*256)
	Next
	Flip 0
Next
Print "16 images:"+ ((MilliSecs()-start)/1000)+" ms per frame"


You will see some stats at the end of running this little app. Everyone of those 3 possibilities end up with 4ms on my computer/gpu-set so the differences may be only visible on older machines.


bye MB


Rico(Posted 2008) [#21]
Hi MB - I got the the same for every test with your program, but then I modified it to use the background image from my game and I got vastly different results for each test. So maybe it is your choice of background that is the problem? Perhaps its very easy for the gfx card to display.
My game background is basically a black background with various platforms on top and a solid border all round the screen.
Here are the results I got 111 fps for a single fullscreen image, 61 fps for 4 optimized images and 82 fps for 16 256x256 images. This results were constant every time

It seems that my card is better drawing just 1 fullscreen image, but strangely prefers drawing 16 images to 4 optimised images. Can anyone explain why? would be interested to know why.

Thanks


MGE(Posted 2008) [#22]
What sizes are those optimized images? Remember, blitzmax internally will make any non power of 2 texture the next size up. So a 1024x768 image is actually 1024x1024 internally.


Rico(Posted 2008) [#23]
There are 4 optimised images - two 512x512 images and two 512x256 images.

The 16 images are 256x256 each


Derron(Posted 2008) [#24]
It shouldn't make a difference if you grab an image or load an image with the same flags (dynamicimage etc.) - but somehow it did.

First I thought: Maybe you should "optimize the 512x256 to 4x 256x256, I dunno how blitz behaves when width and height are different", but then I tested, it seems not to mage big differences.


Hmm tested something more odd...

first: SetViewport(0,0,512,512)
second: measured time the big image(1024x768) is drawn
third: measured time only one image(512x512) is drawn

guess who won the race - the big image got drawn 1000 times in a smaller amount of time than the less memory consuming smaller sprite - and yes I commented out the other sprites so it was a race between 1024x768 and 512x512.


I don't know if something got changed during 1.24 -> 1.30 because my experiences lay in the older version i got to get used for months.


bye MB


MGE(Posted 2008) [#25]
"512x256" hmmm... I need to look at the source and see if those are becoming 512x512 internally.


tonyg(Posted 2008) [#26]
"512x256" hmmm... I need to look at the source and see if those are becoming 512x512 internally
They're not.