something wrong with the IOS gc?

Monkey Targets Forums/iOS/something wrong with the IOS gc?

dmaz(Posted 2011) [#1]
here is a quick test I used to confirm my search for a slow-down I was having. the following code pre-allocates 6000 "Ball" object with 10 "Fill" objects. *After* allocating the objects (not touching the screen or mousedown) brings my iPhone 4 to down to ~5 fps. while running on mac and pc in either GLFW or HTML5 stays 60fps

instruments identified all the processing going to gc_mark and gc_mark_q for the Ball and Fill objects. should the GC be doing this every frame?



therevills(Posted 2011) [#2]
Only thing I can see in the code which might slow it down, is the EachIn statements in Ball.UpdateAll and Ball.RenderAll... every time you call EachIn you create a new Enumerator.

Also 6000 is alot of objects for a mobile device too...


dmaz(Posted 2011) [#3]
thanks, but no it's not that... creating 2 objects per frame isn't going to slow down any phone. every frame your phone creates objects whether you like it or not... you need to keep it to a minimum though so you should always pool objects as much as possible.

actually, the example above is 60,000 as each ball has 10 fill objects. this was just a test to illustrate what happens when all I did was just allocate the objects and not process them at all. I did this because it seemed (according to Instruments) Mark's GC routines were taking more processing than the actual drawing routines like DrawOval...

it seems that Mark's IOS GC's performance is not linear? on my iPhone4 it takes about 1000 Balls for for the gc routines to pass DrawOval in processing time.


marksibly(Posted 2011) [#4]
Hi,

The GC must visit all objects to work out which ones are dead, so it's visiting 60,000 objects per update/render.

The gc could be delayed until 'N' bytes are allocated, but then you'd get big 'lumps' when gc kicked in - all part of the fun of trying to make gc work in a realtime environment. And 60,000 objs is a lot for a mobile device to handle.

In this case, preallocating objects is not helping and you're probably better off using 'new' (if this is what the preallocation is trying to 'solve').

Is slow new even a problem on c++ targets? I know it is on xna/android, but the c++ gc was written under the assumption people wouldn't need to use preallocation, so 'new' was a good indication of 'live' objects.

[edit]
Another thing - make sure to build 'release' version! This is kind of non-obvious on xcode4.2, but significantly speeds up the GC - your code above runs at 60fps in release mode on my iPad (just). Also note this issue has kind of created a situation where you should avoid 'new' on xna/android, but avoid pre-allocation on c++ targets...thinking...!
[/edit]


dmaz(Posted 2011) [#5]
yeah, 60,000 is A LOT but that was just to emphasize the issue for me by bringing down the fps while doing seemly nothing at all. and maybe, it's not really as much as we may think. for instance, my scenegraph nodes have a
position vector object,
world position vector
size object,
handle vector
scale vector
color object
background color object
a float array for the current matrix
and even angle and
blend objects since all of these can be animated.

so that's 10 objects for every node, extended nodes add more. so when I was started to profile I found that the GC was the biggest cpu user even for a just hundred objects.

yes, new is slow on the iphone and quite measure-able... w whole lot slower than desktops. even on desktops I'd pool things like particle engines though.

does the GC have to visit each object every frame? what about only processing a third or something each frame?

better yet, how about a command that leaves an object instance hidden from GC processing? as there are always instances that you want to keep for the whole game or you know exactly where they should be destroyed. like at the end of a level. recycling objects on mobile devices is very important, pooling as well.... I agree that you don't have to pool all objects but if you are going to create more than a few objects in a frame... those should then be pooled or you'll get a 'lump'.

my current testing on an iphone 4 shows that preallocation is important on that device for things like particle engines.... but the GC is taking 20-30% of my processing every loop on the iphone 4

when I was using cocos2d on my iphone 3, I pretty much had to pool everything. I haven't yet profiled this issue on that phone yet but I'll do that maybe tonight.


siread(Posted 2011) [#6]
60,000 objects may be a lot for an action game but for something like a football manager game or an RPG I would say that it's not unusual at all, particularly since we only have SaveState for storing data and cannot whack it in temporary files.


dmaz(Posted 2011) [#7]
I've modified the code.... it currently doesn't preallocate anything so I can profile the build up of objects. it does recycle though so you would have to do a new run to get back to 0. touch adds objects and second touch removes(but recycles)

these last test were for an iPhone 3g running 4.2.1 in release and profiled using xcode 4.2. I had to set the update rate to 30 to display a decent number of rects (500, is stable). I expected this as I ran all my old cocos2d stuff at 30 as well.

I really think adding something like a "alloc" and "destroy/delete" to use -only- when the programmer doesn't want the gc managing the object would be the best of both worlds!

I ended up here with a total of ~25% cpu going to the GC.



profile detail


test code



dmaz(Posted 2011) [#8]
in other words... I think the GC should still know about them even with alloc so it can destroy them as application close if needed. it just wouldn't mark... but obviously would collect with destroy. that would be awesome and a very good addition to a language for games.


dmaz(Posted 2011) [#9]
Mark, I missed your [edit] above... under xcode 4.2 the 'Profile' setting does run in release (I will confirm when I get home) are you running on an iPad 1 or 2? I didn't test on mine but I'll do that as well.

that said, I think the last set of tests I did were more conclusive and they were not pre-allocating lots of objects.


dmaz(Posted 2011) [#10]
Another thing - make sure to build 'release' version! This is kind of non-obvious on xcode4.2, but significantly speeds up the GC - your code above runs at 60fps in release mode on my iPad (just). Also note this issue has kind of created a situation where you should avoid 'new' on xna/android, but avoid pre-allocation on c++ targets...thinking...!
yeah, the original run from my first post was xcodes "run" which meant "debug" as you thought, sorry 'bout that! the tests 2 posts above were all done in release though under profile and a "fixed" (release) run.

just for completeness, redoing the original code under release brings:
ipad 1 = 60fps
iphone 4 = choppy 35-50fps
iphone 3g = 4 fps

i'm thinking about ways to minimize the number of objects I want to use... it's crossed my mind to re-write my nodes to use plain datatypes but that really defeats the purpose of objects in the first place and I can't bring my self to do it.

I just want to reiterate that the GC still takes up the bulk of the cpu even at normal/low object numbers on all three of my devices. so I think this is still a problem. but I also don't want to lose the GC. As I mentioned above (not knowing how difficult implementation would be), allowing some objects not to be visited but only when the programmer specifies through an alloc/delete should be a great solution I think.