Tilemap Performace problem

Monkey Forums/Monkey Programming/Tilemap Performace problem

Shinkiro1(Posted 2016) [#1]
So I have multiple layers, ground, top, etc.
My ground layer is special in that it doesn't just draw the tile, but splits it in 4 parts and then draws them (necessary for autotiling). This means that it will make 4 times as many DrawImage() calls.
My Map is 30x18 tiles = 30x18x4 = 2160 DrawImage() per frame, and this results in poor performance ~ 3-5 ms.

I draw them all from the same texture though (several images created with New(image))
Is this expected behavior?
My normal tile layers take around 0.5 ms.

This is how I make frames.
Function MakeFrames:Image[](image:Image, frameCountX:Int, frameCountY:Int, frameSizeX:Int, frameSizeY:Int)
	If Not image
        Return []
    End
	Local frames:Image[] = New Image[frameCountX * frameCountY]
    For Local y:Int = 0 Until frameCountY
        For Local x:Int = 0 Until frameCountX
            Local index:Int = y * frameCountX + x
            frames[index] = New Image(image, x * frameSizeX, y * frameSizeY, frameSizeX, frameSizeY)
            frames[index].SetHandle(0, 0)
        End
    End
    Return frames
End


This is the Autotile classes I create.
Class Autotile
	Field minitiles:Image[]
	Field island:Image[4]
	Field TL:Image[5]
	Field TR:Image[5]
	Field BL:Image[5]
	Field BR:Image[5]

        Method New(img:Image, tilesize:Int)
		minitiles = MakeFrames(img, 4, 6, tilesize/2, tilesize/2)

		island[0] = minitiles[0]
		island[1] = minitiles[1]
		island[2] = minitiles[4]
		island[3] = minitiles[5]

		TL[0] = minitiles[8]
		TL[1] = minitiles[10]
		TL[2] = minitiles[16]
		TL[3] = minitiles[2]
		TL[4] = minitiles[18]

		TR[0] = minitiles[11]
		TR[1] = minitiles[19]
		TR[2] = minitiles[9]
		TR[3] = minitiles[3]
		TR[4] = minitiles[17]

		BL[0] = minitiles[20]
		BL[1] = minitiles[12]
		BL[2] = minitiles[22]
		BL[3] = minitiles[6]
		BL[4] = minitiles[14]

		BR[0] = minitiles[23]
		BR[1] = minitiles[21]
		BR[2] = minitiles[15]
		BR[3] = minitiles[7]
		BR[4] = minitiles[13]
	End
End


Then in a loop for each tile I call:
(auto is the current Autotile
tl, tr, bl, br are indices)
canvas.DrawImage(auto.TL[tl], 0, 0)
canvas.DrawImage(auto.TR[tr], tilesize/2, 0)
canvas.DrawImage(auto.BL[bl], 0, tilesize/2)
canvas.DrawImage(auto.BR[br], tilesize/2, tilesize/2)



Shinkiro1(Posted 2016) [#2]
Ok, so I now have improved the previous version a bit by making 2 special cases:
1) full tile
2) island tile

These are full tiles, so not separated into 4 tiles.
Then when drawing I check if it's the case and if so draw 1 tile instead of 4.
This obviously only works for the 2 cases, so now I get wild ranging frame time between 1.5 ms - 5ms.
Not sure if that's an improvement ...

edit: it is an improvement, just tested when viewing the whole map (65 x 39).
before it was ~15-16 ms
now 7-10 ms
still interested on how to improve this


muddy_shoes(Posted 2016) [#3]
You don't say what target/hardware you're testing with so it would be difficult for anyone to say if your performance figures are good or bad. Without a running example it's also rather difficult to offer specific guidance on where your bottlenecks might be or how to best address them.

In general you would want to be getting to code with a minimum number of rendering calls at all levels of the API. With mojo1 that often meant writing new methods to accept bulk rendering that skipped the automated batching. With mojo2 it looks like you might be able to use DrawLists to do much the same without having to alter the mojo library itself so I'd look at doing that if I were you.


Shinkiro1(Posted 2016) [#4]
I am on OSX. Using mojo2.
Testtarget was html5.
I tested on Glfw3 with nearly the same results (a bit surprised it's not faster).

Thanks for the pointers. Maybe I can kind of compile the autotile data into an array once, then call DrawIndexedPrimitives ...

I have profiled the bottleneck to be DrawImage(), I just thought mojo2 would batch it when it sees it's from the same texture.


Goodlookinguy(Posted 2016) [#5]
Do tile map chunking and cache tile map chunks into single images and draw that (I cache 4x4 or 8x8 sections). It's complicated to do tile map chunking, but well worth the effort as I get 60FPS even from large scale tile maps with autotiles. I can't really go into detail or give you any code as I'm working on how to optimize animated tilesets myself.


Shinkiro1(Posted 2016) [#6]
When do you create the chunks? If you do it at runtime don't you get spikes when creating them?

The problem for me is specifically the autotiles though,
and I only draw what's on screen.


Goodlookinguy(Posted 2016) [#7]
When do you create the chunks? If you do it at runtime don't you get spikes when creating them?

Yes, you create chunks at runtime. You can make it "smooth" by making a loading screen cover up the caching that's going on. Otherwise, yes, it can cause a temporary spike.

The problem you describe can be solved by doing what I said because it will drastically reduce the number of Draw calls being made. Even if there were no autotiles, when you have to call Draw even in an 8x8 section, that's 64 draw calls. By caching it that turns into 1 draw call. Now try drawing the 8x8 section 15 times and then draw the single cached chunk 15 times. The cached chunk version will run at 60FPS while the other will likely drop to <20 FPS. So for a minor spike and a little memory overhead, you get reasonable speeds.

Alternatively, you could render every possibility for the the autotile and cache that into a tilemap and then reuse it based on the corner situation. I did that a while back and also got 60FPS.

Either way, it's 100% possible with a bit of ingenuity.


muddy_shoes(Posted 2016) [#8]
>I have profiled the bottleneck to be DrawImage()

Unfortunately that's kind of a meaningless thing to say. DrawImage just hands off to a number of other methods so I can't know what your profiler is actually telling you. Again, if you had a working example I, or someone else, could look at it properly.

>I just thought mojo2 would batch it when it sees it's from the same texture.

It does. Batching itself isn't free though. It's always going to be faster to avoid all the checks and intermediate steps involved in the batcher and just create the geometry yourself.


Shinkiro1(Posted 2016) [#9]
Ok guys, thanks for the answers.
Unfortunately I can't really provide a running example as it's tied to my engine, I realize you can only make estimates based on the information I provided (but they helped me).
I think I will try to cache every possibility of an autotile (48 woo) at loadtime, then compile these into 1 texture. Then draw from that.


Ferdi(Posted 2016) [#10]
My knowledge of Mojo 2 is very limited, I have only use it once during a Monkey Jam.

If I have a slowness problem say in Android, I usually profile my code in HTML5 in Chrome, it gives a rough area where it is slow. This is how I do it:

1. Compile for HTML5 to Chrome.
2. In Chrome, press F12. This brings up Chrome's "Developer tools".
3. Go to the "Profiles" tab in Chrome's "Developer tools".
4. Choose "Collect JavaScript CPU Profile".
5. Press "Start" button.
6. Play your game for a bit.
7. Press "Stop" button.

Here is a result from my bananas breakout game: https://www.dropbox.com/s/j615j6srfbjczhr/chrome-profiles.png?dl=0

So to read the result, fillRect is the slowest (or most called) and it is called from class BrickGame method DrawBrick etc etc

If you have Chrome install may be you can do a log of your engine?

Anyway, hope it helps.