V43 issues...

Monkey Targets Forums/Android/V43 issues...

marksibly(Posted 2011) [#1]
Hi,

Ok, I'm seeing several complaints about V43 being considerably slower than V42, so to get some idea of what's up, could people please post what sort of results they are getting with their projects.

How many people are seeing speed increases? Decreases? How much?

Some idea of hardware/OS version would be helpful too.


therevills(Posted 2011) [#2]
Using a simple render test of 100 objects on my LG Optimus One P500 running Android 2.2.1 I saw an increase of 22FPS, from 22FPS to 44FPS.

When I compiled Pirate Solitaire using v43 I didnt see any increase and even a slight decrease (1-2FPS).

Mark, could I suggest you look at the Replica Island source code and see how that app deals with the graphics on Android as it does seem to be the benchmark for good overall performance.

http://code.google.com/p/replicaisland/


AdamRedwoods(Posted 2011) [#3]
Galaxy Tab Wifi, I saw such a minor drop (no increase) in FPS. About 2-4 fps.
But when I had quite a few sprites (>100) I think I had slightly better performance, I will check again.

Was using this demo as benchmark:
http://www.monkeycoder.co.nz/Community/post.php?topic=1120&post=9887


marksibly(Posted 2011) [#4]
Hi,

I guess we should really be using a 'standard' test app here...oh well.

Replica Island appears to be doing a lot of special case checking for certain software renderers (pixelFlinger?), phone 'id' strings etc, and doing things differently accordingly.

I can do some of this, but will never be able to do it as thoroughly as RI does unless I concentrate solely on Android. But Android has already sucked up considerable dev-time, and the results appear to be mixed at best! I'll look into this again eventually, but have other stuff to work on right now.


therevills(Posted 2011) [#5]
RI does do special checking to see if the device can use certain GL extensions. The only specific phone check is does is this:

    private void hackBrokenDevices() {
    	// Some devices are broken.  Fix them here.  This is pretty much the only
    	// device-specific code in the whole project.  Ugh.
        ContextParameters params = BaseObject.sSystemRegistry.contextParameters;

       
    	if (Build.PRODUCT.contains("morrison")) {
    		// This is the Motorola Cliq.  This device LIES and says it supports
    		// VBOs, which it actually does not (or, more likely, the extensions string
    		// is correct and the GL JNI glue is broken).
    		params.supportsVBOs = false;
    		// TODO: if Motorola fixes this, I should switch to using the fingerprint
    		// (blur/morrison/morrison/morrison:1.5/CUPCAKE/091007:user/ota-rel-keys,release-keys)
    		// instead of the product name so that newer versions use VBOs.
    	}
    }


But Android has already sucked up considerable dev-time


True, but it must be one of the main targets of Monkey people want (basing on the number of posts per target - currently Android has 607 then iOS with 273).


marksibly(Posted 2011) [#6]
Hi,

Well, on my Samsung Galaxy S, Android performance is now far better than it was - but that doesn't seem to have been the typical user experience!

I suspect most of the issues are to do with older unaccelerated (or unaccelerated VBOs at least) hardware - well, it's a theory. The GL code is as modern/tight as I can make it, makes the fewest API calls etc, so *should* work well on newer hardware at least.

I'd certainly like to get a better idea of what the problems are, and on what hardware etc before trying any more time consuming 'blind fixes', as they don't seem to have helped with compatibility so far - ie: I'm worried about yet again screwing up one thing by fixing another.

Perhaps a decent 'benchmark' app would be a start?

> True, but it must be one of the main targets of Monkey people want

Definitely, but there are also plenty of non-graphical things that need work - most of which will affect Android too.

I am also looking forward to implementing some of the mojo improvements in GLFW and iOS, which I think will be well worth the effort as these targets have excellent GL support.

Other things I could be doing graphically:

* Molehill driver for Flash.

* Unified C++ GL driver for glfw/ios/android native.

* Common opengl module for raw opengl apps.

I think these are all worth pursuing, and would be more productive in the long run. But yes, you have to weigh that up against working on compatibility.

Gah! Who would have thought Android would turn out to be such a sucky mess on the GL front!


therevills(Posted 2011) [#7]
Android would turn out to be such a sucky mess on the GL front


VBOs are definitely the way forward, but because there are so many different hardware devices it is an issue - I think thats why Replica Island checks the GL extensions.

Here is the Google I/O Writing real-time games for Android redux:

http://www.google.com/events/io/2010/sessions/writing-real-time-games-android.html

This is the guy who wrote Replica Island.

On the slides he wrote this:

Performance Best Practices
• Use VBOs!
• Minimize VBO selection. (and, as usual, all state change)
• Use floating point verts.
• ETC1 texture compression is most compatible.
• draw_texture is the fast path for 2D, axis-aligned texture blits.
• No point in using the NDK just to issue GL commands.
• Most WVGA devices are all fill bound. Target 30 fps.
• Design to scale between low end and high end.
– GL_EXTENSIONS is your friend.
• Simple 2D games might not need OpenGL.
• GLES2.0 is the faster path on devices that support it.



I think the main points are:
* Use VBOs!
* Most WVGA devices are all fill bound. Target 30 fps.
* GL_EXTENSIONS is your friend

before trying any more time consuming 'blind fixes'

Maybe get some lower end devices to test on ;)


Oh and heres this years Google I/O (2011):

http://www.google.com/events/io/2011/sessions/building-aggressively-compatible-android-games.html


Building Aggressively Compatible Android Games
Chris Pruett
There are a lot of Android phones out there, but by abiding to a few key rules it is possible to develop a single binary that runs on all of them. This session will explain how to approach device diversity and build aggressively compatible Android games.



Just watchin it now...

[edit]
I really suggest you guys watch this - very interesting :)
[/edit]


anawiki(Posted 2011) [#8]
So I am the one that complained a lot about v43 performance. Our game uses a lot of texture atlases and minimizes texture swaps. It runs at least 40 FPS on iPad. It does run around 35 FPS in v42b on Galaxy Tab 7" wifi. The same game slows down to 16 FPS on v43. If you want I can upload those 2 builds for you.

This is solitaire game:
http://www.anawiki.com/game/avalon-legends-solitaire

We managed to make it use only 3 textures of 1024x1024 on game screen and we use them without unnecessary swapping (so we draw background, then gui, then all cards from one texture).

Title screen in this game in v42b runs on 37 FPS, in v43 only 25 FPS and it uses even less textures.


I haven't tried this on my HTC Wildfire yet, but due to its low specs I don't think it will be playable at all in current version of the game :D


jowli(Posted 2011) [#9]
Lost approx 20FPS on my game. Displaying large background image plus about 10 objects.

HTC Desire running Android 2.2. (I believe this has a custom GPU)


dom(Posted 2011) [#10]
Hi!

I tested our game on my HTC Desire:
42: ~ 25 FPS
43: ~ 10 FPS


What do you think of my proposal here: http://www.monkeycoder.co.nz/Community/posts.php?topic=941 using http://www.badlogicgames.com/wiki/index.php/Direct_Bulk_FloatBuffer.put_is_slow ?

I haven't profiled 43, but it (the buffer conversion) might still be the case.


Greets!
dom


devolonter(Posted 2011) [#11]
Greetings to everyone! In all tests that I've conducted v43 was faster than v42b, but tests results do not have anything in common with reality. In port Impact game I have 50% FPS loss in v43. And that is very strange. Also I have noticed one more characteristic, at a given time v43 very sharply loses FPS and becomes very slow. Objects start snatching. If I press Home during this, and then to return, FPS appears again in norm...


DGuy(Posted 2011) [#12]
Using the "mak/bouncyaliens" demo ...

NOTES:
- Run on Android
- 1024x600 screen size
v43 | w/ Alpha        | w/o Alpha
----+-----------------+-------------
100 | 31    (+10-11)  | 48    (+1)
200 | 20-21 (+8-9)    | 42    (+5-6)
300 | 15    (+7)      | 31-32 (+2-3)
400 | 12    (+6)      | 26    (+2-4)

NOTES:
- HICOLOR_TEXTURE [True|False] had no effect performance-wise
- # in parentheses represent FPS changes over v42


I'll get my current App running under v43 and will post those numbers also ...


marksibly(Posted 2011) [#13]
Hi,

> * Use VBOs!

Well, this was the big change in V43!

I had another look at Island Replica, and there appears to be some source code missing (where is the RenderElement class?). Has anyone compiled it? I can't find any use of VBOs in there (just checks for support), but it does use a GL extension called glDrawTex - but it can't rotate, only scale (so we'd have to use glMatrix for rot which is SLOW) and can't draw 'subrects' so your textures have to all be separate images. Yes, I can add a special case checks for all this, and it *might* work, but before blowing another week on this stuff I'd want to know more first.

> Maybe get some lower end devices to test on ;)

I would if I had some idea of where the problem lies - is it certain GPUs? Software renderers (which I NEVER really intended to support!)? Devices with only GL10 and perhaps emulated VBOs? Otherwise, I'll just end up fixing things on a device-by-device basis which could end up being a massive waste of time.

> http://www.badlogicgames.com/wiki/index.php/Direct_Bulk_FloatBuffer.put_is_slow ?

We've already tried that hack - a little faster for some, a little slower for others. I have a feeling it was relevant about a year ago, not so much now.

> In all tests that I've conducted v43 was faster than v42b, but tests results do not have anything in common with reality.

Well, they should, otherwise the tests are flawed!

> Objects start snatching.

What does this mean?

> Also I have noticed one more characteristic, at a given time v43 very sharply loses FPS and becomes very slow.

This has GOT to bug some kind of bug in the GL driver (perhaps VBO related?) Or perhaps you're running out of texture memory somehow?

Anawiki, could you (and anyone else) suffering from V43 slowdown email me the project, along with your phone specs and the FPS drop you're getting?

If enough people do this, perhaps we can identify a common element amongst phones and take it from there.


therevills(Posted 2011) [#14]
Has anyone compiled it?


Yep... very easy to set up too.

Check out the source from SVN, then in Eclipse create a new Android Project from existing source (point it to the RI code), click Finish. Then compile - Done :)

According to the presentation he uses draw_textures for the sprites and backgrounds and VBOs for the tilemaps.


DGuy(Posted 2011) [#15]
Numbers for my current app ...
v41  : ~38 fps
v42b : ~40 fps
v43  : ~25 fps   :(



therevills(Posted 2011) [#16]
In Replica Island the VBO code is in Grid.java the method generateHardwareBuffers, which is called from TileVertexGrid.Draw.

So this generates hardware buffers (VBOs) if the device supports it:
TileVertexGrid:
public void draw(float x, float y, float scrollOriginX, float scrollOriginY) {
...
            Grid grid = generateGrid((int)mWorldPixelWidth, (int)mWorldPixelHeight, 0, 0);
            mTileMap = grid;
            mGenerated = true;
            if (grid != null) {
                bufferLibrary.add(grid);
                if (sSystemRegistry.contextParameters.supportsVBOs) {
                	grid.generateHardwareBuffers(gl);
                }
            }
...


Grid:
    
    public void draw(GL10 gl, boolean useTexture) {
        if (!mUseHardwareBuffers) {
            gl.glVertexPointer(3, mCoordinateType, 0, mVertexBuffer);
    
            if (useTexture) {
                gl.glTexCoordPointer(2, mCoordinateType, 0, mTexCoordBuffer);
            } 
    
            gl.glDrawElements(GL10.GL_TRIANGLES, mIndexCount,
                    GL10.GL_UNSIGNED_SHORT, mIndexBuffer);
        } else {
            GL11 gl11 = (GL11)gl;
            // draw using hardware buffers
            gl11.glBindBuffer(GL11.GL_ARRAY_BUFFER, mVertBufferIndex);
            gl11.glVertexPointer(3, mCoordinateType, 0, 0);
            
            gl11.glBindBuffer(GL11.GL_ARRAY_BUFFER, mTextureCoordBufferIndex);
            gl11.glTexCoordPointer(2, mCoordinateType, 0, 0);
            
            gl11.glBindBuffer(GL11.GL_ELEMENT_ARRAY_BUFFER, mIndexBufferIndex);
            gl11.glDrawElements(GL11.GL_TRIANGLES, mIndexCount,
                    GL11.GL_UNSIGNED_SHORT, 0);
            
            gl11.glBindBuffer(GL11.GL_ARRAY_BUFFER, 0);
            gl11.glBindBuffer(GL11.GL_ELEMENT_ARRAY_BUFFER, 0);


        }
    }



therevills(Posted 2011) [#17]
Also just found this:

http://code.google.com/p/apps-for-android/source/browse/trunk/SpriteMethodTest/

It compares the relative speeds of various 2D drawing methods on Android:
* Canvas
* OpenGL ES - Basic Vert Quads
* OpenGL ES - Draw Texture Extension
* OpenGL ES - VBO Extension

         +--------+--------+--------+--------+
 Objects | Canvas |  Quads |DrawTex |  VBO   |
+--------+--------+--------+--------+--------+
| 10     |   66   |   58   |   62   |   62   |
+--------+--------+--------+--------+--------+
| 100    |   58   |   38   |   55   |   52   |
+--------+--------+--------+--------+--------+
| 200    |   37   |   21   |   33   |   30   |
+--------+--------+--------+--------+--------+
| 300    |   25   |   15   |   21   |   21   |
+--------+--------+--------+--------+--------+



degac(Posted 2011) [#18]
Hi

after reading all the posts (I own a LG OptimusOne) I suspect a 'definitive solution' is (at the moment) not available: different hardware+different OEM drivers+different AndroidOS = mess.
A possible solution (not the best one! I must admit...):
we choose to develop for a single hardware platform (ie:Tegra2) and Mark creates a 'Tegra2-mojo-driver' (and so on for every possible hardware) and the game works ONLY (or is 'certified' only) for that device/platform.
(some games doesn't run on my device because they are 'hardware-locked' (ie:screen resolution 800x600) so this is not a total news.
So PirateSolitaire for Tegra device and 'standard edition'.
For Mark this should be a solution: he can add new harware-mojo-optimized driver and the programmer choose what he want to develop for.
Better than trying to look for a solution for 'everyone'.

Of course 'write once and run everywhere' is to forget.

Sigh!
(it seems to go back at the time of 3dGlide vs rest-of-the-world)


therevills(Posted 2011) [#19]
@degac - go watch Building Aggressively Compatible Android Games

:)


devolonter(Posted 2011) [#20]
Hmm... Interesting fact. Set MAX_VERTICES = 128 in mojo.android.java gave a small performance improvement in Impact game, but it worsened FPS in tests. That is strange...


pantson(Posted 2011) [#21]
Hi Mark.
Not much help to you but I tested my latest game on v43 and there is definitely a slow down in FPS. The is slow down even on the title screen, which has no garbage collection on or uses any colorize for images. (it does draw coloured rectangles)
I don't have any def stats for you.. is there some test apps that can be run so that you can collect data from my device?

spec: Sony X8, resolution 320*480, android 2.1


AdamRedwoods(Posted 2011) [#22]
Hi, hate to chime in again, but I've tested my current work-in-progress and experienced heavy slowdown with V43, and that includes the V43+ fix.

It seems these little 'benchmark' examples don't reflect what I'm doing in the actual game.

In benchmark examples 100 objects draw at about 20 fps.

With my game (Galaxy Tab wifi (P1010) Android 2.2.1):
With v43+fix I was at about 14 fps.
With v40 I was up at 30 fps.


Wow, what a difference. It seems the small benchmark examples don't give a big difference for me, but my game which does a lot more calculations per frame, gives a major fps difference between the two mojo versions. I also use a lot of animation, I wonder if that's an issue? I don't know.

I will see if AngelFont makes a better benchmark for the different versions.


AdamRedwoods(Posted 2011) [#23]
Ya, beaker/AngelFont is taxing on the Android system:

(debug mode, Galaxy Tab wifi)
v40 2 fps
v43 4 fps

without debug i get 2fps more on both versions.


therevills(Posted 2011) [#24]
Yeah AngelFont and DrawText is pretty taxing on Android, due to the fact that each character drawn is a new image being drawn.

I ended up "hard" drawing my text into images - for example the word "SCORE" has 5 characters, so using AngelFont/DrawText would require 5 calls to DrawImage whereas if you have one image with the word "SCORE" in it, it is of course only one call the DrawImage.


AdamRedwoods(Posted 2011) [#25]
Right, and this is why I think a "create Bitmap" is needed, so we can create dynamic text as a single image.... but that is a different topic.

I also tested AngelFont with lower quality png8 images, no alpha, etc, and had no change in my fps, so I don't think I'm texture bound with AngelFont.

THEN, I took out the draw routine from 'char.monkey' so I could see if it's the mojo drawing routine and, lo-and-behold, I got 5 fps.

So the problem is AngelFont is terribly slow, NOT the drawing part.

-----
ANYWAYS,
that led to to think, I wonder if it's NOT gtkGraphics that's slowing my actual game (see above) down, so I took the guts of v43 mojo.native gtkgraphics and replaced the v40 version, to see if the slow down was independent of gtkgraphics.

NO, it made the fps even WORSE. I was down to 9 fps at one point.


AdamRedwoods(Posted 2011) [#26]
Back to my in-progress game, I managed to use a sprite sheet and frames for the individual drawings.. since the sprites all draw at the same time in the same bunch, it's using the same texture. I thought for sure I'd see a speed-up.

Turns out, no speedup at all.

I'm starting to think it's not the graphics, yet v40 gives such a performance boost. Seems strange that VBO would give such poor performance. I'll check the GPU to see if it allows VBOs.

--------------

If i did this right in mojo:
boolean CheckVBO() {
	String extensions = gl.glGetString(GL11.GL_EXTENSIONS);
	return (extensions.contains("vertex_buffer_object"));
}


no VBO on the galaxy tab wifi... so strange. :(
But it does have vertex_array_objects, and map_buffers...


therevills(Posted 2011) [#27]
Yeah, Replica Island does this check as well and if VBOs cant be used it falls back to use software rendering. Im thinking Monkey should do the same...


marksibly(Posted 2011) [#28]
Hi,

> no VBO on the galaxy tab wifi... so strange. :(

That's because VBOs are part of GL11, so there is no 'extension'. If you check the GL10 extensions, it'll probably be in there.

Your hardware is a Galaxy Tab which should definitely have good VBO support...

Can you email me the monkey source+media?

[edit]
Also, are you sure it isn't a 'pixel density' issue? One of the updates turned on 'retina mode' for Android phones a while back, so perhaps you're drawing 4x more than before?
[/edit]


dave.h(Posted 2011) [#29]
im getting the same bad framerates compared to v40 on my galaxy tab and also on my experia x10.Seems ive bought the only two android systems which experience this problem.Ive noticed that im getting much lower framerates and even when im only using a few sprites in game.The difference is as much as 40% drop in framerates.I recompiled my demo towers in enferno on v43 and on later levels it became unplayable as opposed to around 30 fps on v40.That was on my x10.The galaxy tab dropped from 60 to around 36 fps. I dont know if theres any common link between the two android devices im not clever enough to figure that out.VBOs , pixel density and retina mode still confuse me.


AdamRedwoods(Posted 2011) [#30]

That's because VBOs are part of GL11, so there is no 'extension'. If you check the GL10 extensions, it'll probably be in there.

True, I reviewed Replica Island's source and sure enough, they only check if its GL1.1. I didn't see VBO specifically in GL10.extensions. No biggie.

Also, are you sure it isn't a 'pixel density' issue? One of the updates turned on 'retina mode' for Android phones a while back, so perhaps you're drawing 4x more than before?

I'm still on stock Android 2.2.1 with the Galaxy Tab, and I think the update is for Android 3 (2.3).
http://developer.android.com/guide/practices/screen-compat-mode.html
Even so, it wouldn't explain the odd difference between monkey v40 and v43.

Can you email me the monkey source+media?

Will do. Sent.

It's such a strange problem, and I've been ripping apart the mojo.graphics code, replacing chunks between v40 and v43 trying to find what specifically it could be, but no avail. I will continue to hack away.


marksibly(Posted 2011) [#31]
Hi,

OK, I've had a look at a few projects now and the issue seems to be with state changes being more expensive when using VBOs for small 'batch sizes' on (at least) Samsung PowerVR devices. Which isn't actually too surprising...

A batch size is the number of images (or frames) that can be drawn consecutively without changing texture (ie: image or image atlas) or blend mode.

As the batch size increases, VBOs become much faster. The 'tipping point' will depend very much on app, image size etc.

In the case of one 'tilemap' based project above, sorting all the tilemap rendering 'by image' (ie: drawing ALL tile1 images, then ALL tile2 images etc instead of just drawing tile1, tile2, tile1, tile2...) resulted in a major speed up.

A better solution would be to make more use of image atlases - eg: pack all grass/dirt/tilemap images onto the same image atlas. If you do this, you wont need to sort tile rendering 'by image' as it's all the same image!

In general, I'd advise people to:

* Pack all your 'tilemap' images onto a single atlas image. This means no state changes when drawing the tilemap which will give you max VBO performance.

* Draw all your 'actors' sorted by image, eg: draw all 'baddy1' images in one go, all 'baddy2' images in one go etc.

Yes, all this means you need to 'help out' the engine a bit, but I think it's worth it for the increased performance you (theoretically) get when you do it right.


AdamRedwoods(Posted 2011) [#32]
Interesting, and this would explain why single-sprite benchmarks don't change that much for me between v40 and v43.

Also a similar VBO explanation from AndEngine (another android engine):
http://www.anddev.org/android-2d-3d-graphics-opengl-problems-f55/best-2d-rendering-method-t15064.html
more explanation:
http://www.mail-archive.com/android-developers@...

Thanks Mark! I will try putting everything into atlas textures and see what happens next.


anawiki(Posted 2011) [#33]
Texture atlesses could explain it, but my game already uses them and does what you advice. On level one there are about 5 texture changes only including bg and fonts. And I still see slowdown.

You should've my source already.


therevills(Posted 2011) [#34]
I've been reading the "Beginning Android Games" by Mario Zechner, and found this:

P340

What’s Making My OpenGL ES Rendering So Slow?

That the Hero is slower than the second-generation devices is no big surprise.
However, the PowerVR chip in the Droid is slightly faster than the Adreno chip
in the Nexus One, so the preceding results are a little bit strange at first sight.
On further inspection we can probably attribute the difference not to the GPU
power but to the fact that we call many OpenGL ES methods each frame,
which are costly Java Native Interface methods. This means that they actually
call into C code, which costs more than calling a Java method on Dalvik. The
Nexus One has a JIT compiler and can optimize a little bit there. So let’s just
assume that the difference stems from the JIT compiler (which is probably not
entirely correct).

Now let’s examine what’s bad for OpenGL ES:

* Changing states a lot per frame (e.g., blending, enabling/disabling
texture mapping, etc.)
* Changing matrices a lot per frame.
* Binding textures a lot per frame.
* Changing the vertex, color, and texture coordinate pointers a lot per
frame.



Also a bug in FloatBuffer:

P420
The method boils down
to calling FloatBuffer.put(float[]), and that’s the culprit of our performance hit here.
While desktop Java implements that FloatBuffer method via a real bulk memory move,
the Harmony version calls FloatBuffer.put(float) for each element in the array. And
that’s extremely unfortunate, as that method is a JNI method, which has a lot of
overhead (much like the OpenGL ES methods, which are also JNI methods).


There are a couple of solutions. IntBuffer.put(int[]) does not suffer from this
problem, for example. We could replace the FloatBuffer in our Vertices class with an
IntBuffer and modify Vertices.setVertices() so that it first transfers the floats from
the float array to a temporary int array and then copies the contents of that int array to
the IntBuffer.



Im trying to go thru the book and see where Monkey can be improved with the same architecture it currently has - but Im an OpenGL noob!


anawiki(Posted 2011) [#35]
How much is too much for:

* Changing states a lot per frame (e.g., blending, enabling/disabling
texture mapping, etc.)


Can 10 texture swaps make galaxy tab chuckle?


AdamRedwoods(Posted 2011) [#36]
After much work re-organizing the image loading, I've managed to get the frame rates up again.

I have about 13 texture changes, so far and I can reduce it more.
My FPS are around 30 (the max) to start with, then drops to 15-18 after the action picks up.

I did as Mark suggested (thank you) and am using atlas textures.

I will release my texture atlas module later, after I tweak it more, but there are already some good ones available.

ONE MORE THING:
There should be a way to load or combine bitmaps into one large bitmap/texture in mojo. This is most helpful with interfaces and numbers displaying quickly. Without it, we'll be suffering from frame rate issues for a while, or creating text by hand which is cumbersome. (edit: I see why that's difficult now: http://www.badlogicgames.com/wordpress/?p=1073 )

I'm very tired now.


AdamRedwoods(Posted 2011) [#37]
Well.....

after aggressively moving almost all textures I can to atlases, V40 is STILL faster than V43. Not by much, about 3-4 fps.