Memory access crash under heavy resource load

Monkey Forums/Monkey Bug Reports/Memory access crash under heavy resource load

Nobuyuki(Posted 2015) [#1]
Hello,

This thrasher was designed to test the OnLoading() method on various targets. However I noticed that the code as compiled caused a crash after clicking a few times. I gave the source to a few others who seemed unable to reproduce the bug, so instead I'll post it here along with a compiled version as "proof" of the crash.

Code:
http://pastebin.com/snvg0jac

Binary / resources:
https://s3.amazonaws.com/uploads.hipchat.com/11212/31746/g0d0EJuoCjsFIEh/untitled_1.buildv82a.rar

My system is currently Windows 8.1u1 64-bit, and my compiler setup is GCC 4.8.1 (a stripped-down MinGW standalone which I believe came with the demo version of Jungle but should otherwise be identical to this version of GCC). The build target was GLFW3. The version of Monkey used was v82a.

Questions:
Can the crash be confirmed?
Should OnLoading() execute on any Desktop targets? Under what circumstances?
Is this code generally bad practice? Is it possible this crash could occur at runtime under normal circumstances? I was under the assumption that the render loop should not continue to execute while the update loop is locked under heavy load.

Thanks for your time.


ImmutableOctet(SKNG)(Posted 2015) [#2]
Well, for one thing, you're loading a huge image into RAM, and then VRAM, 10 times over. That's a lot of I/O. Monkey's garbage collector isn't going to collect based on memory usage without manual configuration on the GLFW targets (As well as iOS, from what I remember); the same technically goes for STDCPP / C++ Tool (They all use the same standard GC). If you use manual discarding (The 'Image' class's 'Discard' method), then the memory footprint would stay more consistent. The fact is, it's crashing because of ridiculous memory usage.

That being said, 'OnLoading' doesn't work for a different reason. Monkey uses decoding functionality from STB, which isn't done in a controlled "asynchronous" manner like it can be on the HTML5 target, for example. This can be done by using the asynchronous functionality Monkey provides, however. Basically, if you want perfect cross-platform loading screens/animations/other, you'll have to write your own system where 'OnLoading' is called from 'OnRender'. This means you'll have to load asynchronously using 'LoadImageAsync', and simply call 'OnLoading' yourself with a basic check. This also means you can properly decouple any update routines you'd have. Such a system could be used as a fall-back for targets which do not have support for 'OnLoading' by default. That being said, this means you'd also have to use the preprocessor to "inject" a call to an update routine inside of 'OnLoading' (If you were to use such a thing; obviously only for targets which support 'OnLoading' properly). That also means you can't use this system for input-detection on all targets; another benefit to using your own async-calls. Basic loading screens don't really need much as far as main update routines, though. You'd basically just need to draw a loading bar (Or something similar) based on the number of assets loaded compared to the number of assets needed.

Monkey doesn't use async-loading on all targets, currently, only some. Because of this, setups like what I proposed may be needed on some targets. I recommend that you manually discard your images if you plan on heavily controlling your assets. If you plan on using 'Discard', be sure that nothing else is pointing to that image/sound/other. Garbage collection is target-dependent, so it's best to cover your bases wherever possible. Here's the thing, Monkey's documentation really should reflect this behavior as target-dependent. But, since Mark was lazy with it, the only way for the average user to know this is to view the source code.

I hope this clears things up. Unfortunately, Monkey doesn't deal with 'OnLoading' very well on some targets, so you'll have to sort that out yourself. On the bright side, this is the kind of thing I'd write into my game anyway, so it's how I prefer to handle things.

There's also the fact that most desktop systems will be able to load your assets much quicker than mobile devices and the like. This makes the manual/controlled approach better for me. I'd also like to mention that optimizing uses of 'OnRender' via "state-controlled" redraws with manual loading screens is your best bet. Over optimization could be a headache, though. Maybe setting the update rate specifically for the loading screen, and making it render slower would be better? This really depends on what kind of loading-screen you're going for. For example, a simple animation in the bottom left saying "Loading" would be better done without any further optimizations; though, "decoupling" is still a good idea.


Nobuyuki(Posted 2015) [#3]
The fact is, it's crashing because of ridiculous memory usage.


I find this unlikely. The application was tested and compiled for different targets from a few different people, and as I said, they were able to confirm the crash on my build on their machines but unable to confirm it with their builds. Your explanation seems that it would only make sense if the crash occurred under similar circumstances regardless of the machine it was built on, but as I said their builds didn't crash. I presumed it may have been an issue with the allocation, but it seems more likely that an operation executed out of order at some point when it shouldn't have because it was being hammered and that this problem is specific to the target. That's just my presumption. The crash gave a monkey-specific error, and the debugger was able to catch it and provide uncorrupted data (hoob returns Null during DrawImage() and a Memory Access Error occurs).

It could still be the GC, since the crash seems to occur after 2 clicks on the offending build. I believe the default GC mode for glfw however is to check for orphans every update, which again would seemingly negate out the notion that it's simply running out of memory, since it waits for user input and how fast you do it doesn't seem to matter as much as how many times you do it.

Thanks for your other explanations. I already use an asynchronous loader for another project, although since not everything can be moved off to another thread, it kinda negates out some of the purpose of using it since loading/parsing big json/xml files can also cause a stall. I mainly was testing OnLoading() on glfw to see if I could provide a "last chance" way to forcibly blank the screen at a controlled point under a heavy resource load (since the render loop seems to run mostly independently of the update loop and my game can appear to hiccup at certain points during a transition that don't line up perfectly with when they're actually being loaded). While testing if OnLoading() could be used to deal with that, I ran into this (presumably unrelated) crash by accident.


ImmutableOctet(SKNG)(Posted 2015) [#4]
Have you not tried running a GLFW build of your example on Windows? It spikes up to 1GB, and continues to go up, because the GC can't handle it. Look in the task manager. You're loading 161MB to the CPU, then to the GPU per-click, not to mention the hard-drive/other load (Smaller files like this should be cached, though), and the decoding work. And then there's potentially heap allocated temporary data, and of course, reference handling on Monkey's side. It doesn't even crash for me, it just stalls, then Windows's heap attempts to rearrange itself for performance reasons (Consistency is better for keeping cache misses down). Things get moved on to the page file, so the system stalls (Thanks, Microsoft). The end result is a Window which is not accepting events, and can therefore be assumed as not responding by the system. When Windows sees that something isn't responding for a "large" amount of time, it will assume it crashed. Monkey's standard C++ garbage collector is call-back based ("OnBlah") when using game-targets (This can be configured). From what I remember, it doesn't clean up very much per "round". Because of this, the memory usage (RAM) will keep spiking. On top of that, the GPU will run out of memory, because "surfaces" are only collected when the GC gets to them. This means the surface stays allocated for potentially long periods of time. So, that image-data will stay on the GPU as a texture for a while. This then means that all of your VRAM will be taken up rather quickly. With your VRAM fully used, your driver may try to either off-load the storage to the system's RAM (CPU side of things). Or, the more likely case is that the driver isn't retuning a proper "surface" (Texture object from OpenGL) for you, so Monkey catches an exception it can't resolve properly. The end result is a generic "Memory Access Violation". Basically, your system is running out of memory.

This could either be GPU or CPU related, but at the end of the day, it's not a Monkey bug. Though, high memory usage should probably be looked into further. Systems and drivers vary, so it's not a surprise that different people get different results.


Nobuyuki(Posted 2015) [#5]
Have you not tried running a GLFW build of your example on Windows? It spikes up to 1GB, and continues to go up, because the GC can't handle it. Look in the task manager. You're loading 161MB to the CPU, then to the GPU per-click, not to mention the hard-drive/other load (Smaller files like this should be cached, though), and the decoding work.


Of course I have. The binary included in my original posting is a GLFW3 build on Windows. However the problem was not consistent across build targets (more details at the end of this post).

How many clicks does it take for you to crash the build? For me, exactly two. It increases by the exact amount you described, and tops out at 800mb before it decides to crap out. I have 8gb of RAM and 1gb of VRAM. Do you think you can describe in more detail how "the GC can't handle it" which leads to the crash under these circumstances? I'll be willing to believe that the VRAM is being exhausted and returns a null surface, but I'd like more confirmation. If you're not totally annoyed with me by this point, could I ask you to run the test one last time and count the number of clicks it takes along with how much VRAM you have? (time the clicks slowly when the app is responsive just for extra brownie points) :)

I just had someone with 512mb test out the program and it crashed for them after "just one click". Strangely, however, their glfw2 build did not crash for them. Because of this, the crash seemed build-specific (but not PC-specific) to me. My belief was that the GC would be able to mark and clear this stuff pretty quickly under normal circumstances (before the user would have a chance to react, provided the thread hadn't locked for a significant amount of time). All of this combined led me to believe the problem was possibly something in Monkey (the GC, threads, something under the hood). I had not considered before that the GPU would've sent back a null surface unexpectedly if it ran out of memory. hmm... but I'm not entirely convinced that this is what's happening. This is why I'm asking you to test this and send back your confirmation of the results.


marksibly(Posted 2015) [#6]
Well, this is a weird one.

If I compile/build here, it works OK and doesn't crash - mem usage hits about 312M and stays there. Image loading takes about 3.5s.

But if I run your prebuilt version, it does indeed crash after second click and image loading seems to take ages.

I did a windiff on your/my main.cpp files, and they appear to be the same. Which version of g++ are you using? I'm using tdm-gcc-32 v4.8.1.


ImmutableOctet(SKNG)(Posted 2015) [#7]
I pressed it ten times, and it still hasn't crashed. The fact is, the memory usage jumps up to 1GB, then it quickly lowers to 600MB for me. After that, the more I press it and wait, the more the garbage collector handles the previously allocated objects. After those ten attempts, the memory usage goes down to about 300MB, and continues to lower. If I edit your code to use manual deallocation (Via the 'Discard' method), then overall memory usage lowers dramatically. For crying out loud, I'm running the AMD/ATI compatibility drivers, and I have 512MB of VRAM. I'm running Windows 8.1 (x64), and I have 4GB of RAM. This sounds like either a Windows heap issue, or a driver issue. Either way, that image is rather large, all things considered. Monkey's (C++) garbage collector could probably be tweaked to deal with this sort of thing, but it's not like other garbage collectors don't already pull this off similarly. If nothing else, you could always tweak it yourself, using the preprocessor. Also, compiling in debug is a pretty bad idea for testing this, but the results I was referring to were with your debug-executable.

Come to think of it, now that C++11 is formally supported by just about everyone; I wonder if using the new reference-handling functionality would be a valid option for garbage collection in Monkey. Just thought I'd throw that out there; I've been thinking this could work for a while now. It would basically just be for long-term storage to begin with (Make classes' references use 'shared_ptr'). Temporary objects could technically be encapsulated with 'unique_ptr' if the compiler detects no external uses; this would be a rather annoying thing to add, though. But you get the idea; I'm not too sure about the specifics when it comes to "re-sharing" objects that are already shared, though. My thought was to keep everything using standard pointers besides long-lasting variables, but that might not be realistic (Or standard). I'm not even sure about the overhead one way or the other. Just something I was considering looking further into.


marksibly(Posted 2015) [#8]
Still curious about the version of g++ he's using and would not discount a c++ compiler bug - the same main.cpp is crashing depending on who compiles it.


impixi(Posted 2015) [#9]
From his first post:

My system is currently Windows 8.1u1 64-bit, and my compiler setup is GCC 4.8.1 (a stripped-down MinGW standalone which I believe came with the demo version of Jungle but should otherwise be identical to this version of GCC). The build target was GLFW3. The version of Monkey used was v82a.



I have the same specs (and compiler) and it crashes for me too.

EDIT
"g++ --version" returns:
g++ (tdm2) 4.8.1


marksibly(Posted 2015) [#10]
"g++ --version" might not tell you what g++ Monkey is using though: check MINGW_PATH in the bin/config.winnt.txt file.

But I'm guessing we are all running 4.8.1. Very strange.


impixi(Posted 2015) [#11]
Good point. But IIRC I forced Monkey to use that version of MinGW when I installed it, because it installed into a non-default folder (my compiler came bundled with CodeLite, and I believe it differs from the one listed in Monkey's docs)... But you may be correct. I'll install Monkey Pro 83a, set that up to use the compiler and try the above code again...


impixi(Posted 2015) [#12]
Well, embarrassingly, my earlier crash was because I incorrectly copied across the "background.jpg" file. For me, the code runs without crashing, in 83a and 82b. The binary file in Nobuyuki's RAR also runs without crashing but according to Task Manager there is a big difference in memory consumption: 644mb (for the provided binary) as opposed to 160mb for my locally generated binaries.

EDIT: FFS, Nobuyuki's binary *is* crashing for me now, after a click or two.


marksibly(Posted 2015) [#13]
The binary version is loading 50 images in OnCreate - the above code only loads 10. Found this out by examining OnCreate/OnUpdate in main.cpp...

50x2748x1536 is pretty intense memory wise (approx 844M for 4 bpp images) although Monkey probably shouldn't crash here, but return a null image from LoadImage I guess. That said, you'll probably want to think about using Discard to prevent mem usage going bananas in general.


Nobuyuki(Posted 2015) [#14]
heyyyyy. Sorry for disappearing for a few days. Here's my version string:

g++ (tdm-2) 4.8.1

my config.winnt.txt has the default values there; I'm using Jungle's SDK override from their prefs section, the path points to the "JungleMonkey" portable version of MinGW that was bundled in there. I'm mainly using that version on this machine because it's portable and I remember MinGW's "stable" distro giving a very outdated version of GCC by default (one that hadn't yet fixed the template bug Mark reported a few years ago).

This is some quirky stuff!

EDIT: D'OH! How the heck did I slip up and upload different versions of the code and binary?

Mark, it seems that LoadImage() is producing a null result once resources are 'exhausted', whatever that means. I think I'm starting to finally understand. When manually discarding the images before replacing the reference, it does indeed work! But it seems like the GC (in CPP_GC_MODE=1) isn't recovering the resources from orphaned references automatically in a timely manner (if at all? Needs more testing I guess...). When I set CPP_GC_MODE=2, the problem goes away, even without manually discarding. Mystery solved, or more GC digging needed...? I'm afraid I don't know enough about how the GC's work to say for sure myself, but for future reference I'll definitely make sure to properly discard images and avoid re-loading them whenever possible from now on.


marksibly(Posted 2015) [#15]
Just use Discard.

Depending on GC to release critical resources is just not a good idea, esp. if you're using large amounts of such resources, eg: 800M of video memory.

GC_CPP_MODE=1 will not perform any GC until control returns from OnCreate/OnLoading, so can more easily fail if you are allocating vast amounts of memory in one hit.

GC_CPP_MODE=2 will fail less often, but it can still fail as there is no guarantee of when finalizers will be called, which is when Discard will be automatically called for you.

Calling Discard yourself is the ONLY way to guarantee video memory will be released immediately - and if you are allocating a lot of video memory, this is important.