Threaded Performance and GC

BlitzMax Forums/BlitzMax Programming/Threaded Performance and GC

BLaBZ(Posted 2014) [#1]
I know this topic has been visited before and I was just curious if there's a work around.

Problem:
The threaded Garbage Collector dramatically slows the game down upon collection.

Is there a way to force garbage collection\free memory of particular objects\variables? How would you reduce the amount of Garbage before the Garbage collector automatically is executed?


Brucey(Posted 2014) [#2]
You could garbage collect more often so that there aren't so many things to collect in one go?


BLaBZ(Posted 2014) [#3]
Would this lock other threads? Does this have to be done on the main thread?

I wonder if it would make sense to have a concurrent thread that Garbage Collects every 5~ms.

Thanks for your input Brucey :)


Yasha(Posted 2014) [#4]
I believe standard practice is to try as far as possible not to put yourself in a position where GC kicks in while you're being interactive:

-- enter the loading screen
-- GCCollect
-- allocate all resources for the area in advance
-- GCCollect again
(-- GCSuspend if you need to)
-- end load screen and enter level

During the level, try not to allocate any objects. Create all the things you intend to use in advance and pool them, hide them if they're graphical. When you need more visible interactive things, pull them from the pool and manually return them to it later.

The downside is that some library functions may be allocating small objects behind the scenes so you may be allocating objects anyway without noticing it: one possible hack is to turn off the GC altogether once you have all your stuff in place, and hope that the small objects allocated in the background don't add up to too much until the next suitable pause point where you can GCResume and force another collection.

(When I say "hope", what you should actually do is profile memory usage and see how often you need to insert "slow points" throughout a level where you can turn the GC on again, to keep the background buildup below an acceptable level.)

As long as you keep allocation and collection to "slow points" and loading screens, it won't have a noticeable effect for the end user and hopefully shouldn't matter if all threads lock.

Obviously this only works if your game can be structured around loading and clearing points, if you can predict roughly (order of magnitude) how many "things" are active during interactive periods, and if you are allocating a small-medium number of largish objects instead of a huge number of tiny objects. (If you are allocating a huge number of tiny objects, you could also investigate some kind of secondary pooling technique or a flyweight-style pattern to reduce the number of things passing through the GC in total.)


I wonder if it would make sense to have a concurrent thread that Garbage Collects every 5~ms.

If a concurrent thread is force-collecting from all threads I would assume it does so by locking everything up, no? (I don't know offhand how the threaded GC does work, but it's neither incremental nor generational which I think are prerequisites for that to happen without stopping everything.) Can't hurt to try, of course - whatever profiles well, works.


BLaBZ(Posted 2014) [#5]
Hmm, that's what I was thinking and afraid of, I have a lot of code right now, altogether about 60-70k lines, there's a lot to sift through to make all local objects and variables pooled.

How is the Garbage Collector activated with\without threading? Is it random? For some reason I was convinced that without threading it was activated upon "Cls"


Brucey(Posted 2014) [#6]
The garbage collector is activated on the creation of a new object.
It then decides if it needs to process any old objects for collection.

So, if you are pooling objects, and not creating any for a while, the GC won't be doing any work.
Of course, there's code *YOU* are in control of and the framework BlitzMax code which may be creating objects as you go - it all depends what stuff you are using.


BLaBZ(Posted 2014) [#7]
Is the GC activated for primitive data types?


Yasha(Posted 2014) [#8]
No: by definition a primitive is something that isn't an object and doesn't need to be allocated by itself. The only primitives in BlitzMax are the various kinds of numbers and bytepointers. Strings and arrays are objects, not primitives. Any type you can "New" is GC'd; anything where "New" makes no sense (you can't create a new integer) is not, and takes up no space on its own.

Is this what you were thinking of?


fishy(Posted 2014) [#9]
Hi,

I know it's a discussed topic here, so i wanted to ask if there are any other alternatives for running threaded apps other than discussed?

I've tried to manage memory a bit more but it seems there is always going to be some added mem usage for displaying strings or other things you still have to do every frame. I even did a test to only render the mouse (2d plane using irrlicht) and even just rendering that seems to be adding something to memory each frame?

I'm not an expert with this but am working on a turn based strategy game and would really like the option for Threading for the AI turns and even precalculating moves while the player still has their go?

How BMax is at the moment in threaded mode at just the start of my game I get over a 100ms delay on garbage collection which is annoying and way too noticeable pause in rendering. This pause is just from the stuff i have in memory because i tested by removing everything from the main loop (ie. no rendering or updates what so ever) it will still pause for over 100ms but not clear anything from memory as nothing changed.

Is it possible to run an external C function in a separate thread that you pass all the data too that you want to calculate or could you realistically create a game (turn based would be easier i guess) where no memory is added at all unless moves\changes actually happen. at the moment it just seems adding strings together and drawing to screen or minimal other things seem to add to memory usage that needs to be collected within 10-20 secs at best?

I've put up a simple BMX test file (you can check out my project there too :) that you can see the issue if you compile threaded. You can also see if you turn off GC that the memory usage does keep increasing which I'm guessing is from the string usage? Maybe there is a way I can manage it better to minimize the issue?

Thanks for anyone that can have a look and provide any sort of feed that would be greatly appreciated!
http://kwflgames.blogspot.com.au/p/downloads.html


Brucey(Posted 2014) [#10]
Just because something is threaded doesn't mean it has to affect the performance of your application.

Rather than creating/destroying lots of objects constantly, you may be able to reuse objects by a means of pooling. For example, if you create lots of objects for things like particle effects, create a pool of them that you can take out and put back as necessary. By not creating those objects, you don't generate anything for the GC.

There are all kinds of ways to avoid unnecessary object creation.


fishy(Posted 2014) [#11]
Thakns for the reply Brucey, object pooling is something i was investigating and started to try, but i thought i should investigate it to make sure it will resolve\negate the issue to be worth modifying most of the code i'd already done.

What i noticed is even if i don't create new objects or destroy any each frame the GC would always take over 100ms with my current Game. I'm guessing this is because it searches through all the current memory to confirm if anything is in use, I'm not sure?

The example of this which i have in the BMXtest file i put up is load a bunch of stuff into memory, don't make any changes and run GC every few frames, you'll see it still takes 100ms - 200ms even though it doesn't need to clear anything from memory.

The way i could imagine pooling would still help is if you can get it so each frame no extra data is added to memory. If you could get this perfect then I could call the GC just when i'm processing turns or something which would be acceptable. The reason this doesn't look like it would work is because even just a basic example (as in the test file too) it would still be increasing mem usage each frame that needs to be cleared even when nothing is happening just rendering text to the screen, so it appears just the rendering function still adds data thats needs to be collected?

I hope that makes sense, let me know if any more clarification would be useful?

Thanks,

http://kwflgames.blogspot.com.au/


ziggy(Posted 2014) [#12]
if you do not create new objects, the GC won't fire. I can assure you this. Take into account that there are little actions (like iterating a list, concatenating strings, modifying array's size, etc.) that do generate garbage for the GC


fishy(Posted 2014) [#13]
Ah i wasn't sure about Iterating a list or things like that so that explains a bit thanks, Is that because if any object get assigned to a local variable it will have to be collected, but int's, bytes, etc don't need to be?

For example:
"For Local S:Ship = EachIn ShipList"
"Local S:Ship"

both these would require collection? Sorry just trying to fully understand how it works so I know the limits.

also just curious why the below would still be increasing memory usage as it's isn't using any strings or objects?



Thanks for the info so far!

http://kwflgames.blogspot.com.au/


Brucey(Posted 2014) [#14]
just curious why the below would still be increasing memory usage

DrawText is probably creating something for the GC to do. GCMemAlloced() is a number, which is converted to a String for DrawText.


ziggy(Posted 2014) [#15]
When you iterate a list with EachIn, an iterator object is created. If you iterate using the list nodes (prev, next, etc) you will avoid this. Also most string operations do generate string objects, and that's quite difficult to avoid if you happen to require text to be shown on screen.


fishy(Posted 2014) [#16]
Thanks guys,

It makes a lot more sense now and brings me to the conclusion that I don't think i can use MT for my game because of the slow down. The GC would probably still need to run every few seconds even if i clean up code and pool a lot of resources just from rendering strings, iterating lists and other memory usage that is required in my instance.

But just investigating this I have picked up quite a few optimization tricks and helped me understand some of the bad memory management the irrlicht module has that I can work around or even fix.

http://kwflgames.blogspot.com.au/


Yasha(Posted 2014) [#17]
EachIn is convenient but not a gamebreaker by itself: any code using it should convert to a "dumb" integer- or TLink-based loop with only a small amount of messing about.

For strings, it might be worth looking into creating a DrawText implementation that reads from a mutable array of integers instead of a string (or in fact a mutable C string). That way, for very small, frequently-changing pieces of text like the GCMemAlloced display, you can modify the "string" in-place and not involve object allocation. Larger strings would need to be preallocated.

This is also perhaps a stupid question, but you are on the latest version of BlitzMax, right? Older versions of BlitzMax used a different GC for multithreading that really did scan the entire stack and heap, with potentially long pauses as a result; this has since been removed and it now uses something essentially similar to the single-threaded GC, which was supposed to eliminate observable pause times.


In the worst-case scenario you could always try using threads from a non-threaded build, too, but do bear in mind that doing so is horribly unsafe and you'd need to redesign your program to make sure your worker threads never used any objects, strings or arrays at all, for any purpose (i.e. integers, floats, and byte ptrs only). Threaded builds aren't about the threading, they're about making it safe.


GfK(Posted 2014) [#18]
and brings me to the conclusion that I don't think i can use MT for my game because of the slow down
I've just reached the same opinion. My game runs fine, mostly, but then every now and again it runs like a lame pig for a few seconds before catching up with itself. I was baffled until I read this thread earlier and I'm now convinced it's no more than a GC issue.

So I'll be ripping out all the MT stuff tomorrow and trying to figure out some other way of doing what I started using it for in the first place.


Derron(Posted 2014) [#19]
This all works as long as you do not try something like multiplayer (in the sense of network/online game).

Like written here some years ago: alt tabbing out of a game stops main threads. Potential solutions provided by users were not working in all cases. Don't know if current Windows (7 and 8) still have this behaviour as on Linux this wasn't a problem at all.

So if you need to have your app do something while "in background", you might need do to this in a separate thread.


bye
Ron


Jur(Posted 2014) [#20]
In my experience MT garbage collector in BMax is problematic when working a lot with memory. I have an application which works perfectly stable with normal gc but leaks memory when threaded gc is used (without using any threading code). I have tried to figure out what is the cause but gave up.


GfK(Posted 2014) [#21]
Same here, I'm hardly using any threading at all. It isn't the threading that's the problem, it's the iffy GC that comes with it.