Edited: Compiler needs GC mode 2 :)

Monkey Forums/Monkey Bug Reports/Edited: Compiler needs GC mode 2 :)

Peeling(Posted 2015) [#1]
UPDATE TO THE UPDATE: Seems it was a memory problem after all, as changing the GC mode has fixed it.

Have to say it did a REALLY convincing job of looking like 'number of imports' was the problem, as the same amount of code compiled if arranged in fewer files.


UPDATE: After much faffing and fiddling I discovered the crash is purely down to how many files are being imported during compilation. See bottom of this post for history of the problem.

To cut a long story short, I tried commenting out half of spine.monkey (which is just a big list of imports) to try and narrow down the problem. I ended up with a project that would compile if just the last five files commented out. I could swap any of those files with any of the others and it would still compile, but if I imported just one extra file (even if it had no code in it) the compiler would die.

I then copied the contents of the last five files into one of the other spine files, and it compiles with no problems.

The spectacular irony here is that a few weeks ago, at the urging of another coder on the team, I went through our UI library and split everything into nice, readable, individual files.


=========================================================

I've been trying without success to integrate the Spine runtime module with an existing project. The Spine example project compiles ok, but if I so much as "import spine" or "import spine.spinemojo" in MY project, I get this:

Semanting...
Translating...
Building...

This application has requested the Runtime to terminate it in an unusual way.Please contact the application's support team for more information.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Abnormal program termination.
Exit code: 3


I tried running transcc through the VS debugger, but that was no help. For one thing, even if I recompile transcc in debug, there's no debug information for VS to latch on to. For another, if I run transcc through the debugger (no modifications; just the executable I've been using all year to compile the project) it crashes with an illegal memory access during "Semanting..." regardless of whether I'm importing spine or not.

It's a frustrating situation because Spine is an ideal solution to our project's needs, but I only have a very small window of time to get it integrated. Does anyone have any idea what could be causing the compiler to SIAD, or failing that, how to get a working, debuggable version of transcc I can pick through myself?

Thanks.
Andy


Peeling(Posted 2015) [#2]
NB: using Monkey 80F and latest Jungle.


MikeHart(Posted 2015) [#3]
Does it happen with Ted too?


GW_(Posted 2015) [#4]
Did you look to see how much memory transcc is using? Mark mentioned in the past that there is no gc running because the runtime is so small.


ziggy(Posted 2015) [#5]
@MikeHart: It is not a Jungle Ide thing, if the compiler crashes, it's a compiler thing. I suspect is caused by Trans not using any GC at all. Try recompiling trans with non mojo-based GC.


ImmutableOctet(SKNG)(Posted 2015) [#6]
To elaborate on what Ziggy and GW_ said, use garbage collection mode 2 when rebuilidng transcc. This can be done by using the 'CPP_GC_MODE' preprocessor variable on targets using the C++ based garbage collector. For more information, click here.

In addition, you could look into 'CPP_GC_TRIGGER' and 'CPP_GC_MAX_LOCALS' (As seen on this page). Assuming this fixes the problem, Mark should probably enable the garbage collector for transcc.

EDIT: By the way, have you tried debugging 'transcc' itself? If you need to do it externally (Via Visual Studio), you'll need to make a debug build of transcc with the GLFW target (Disable 'GLFW_USE_MINGW'). The C++ Tool (STDCPP) target does not support MSVC directly. At least from what it's saying in your output, I think it might be an issue with the size of the post-translation output (Or perhaps transcc is using too much memory by the time the native compiler runs).


Peeling(Posted 2015) [#7]
It's using a lot of memory, but nowhere near the 32 bit limit. I've been caught out by the GC before with a preprocessing tool, and that only fell over when it was up around 4GB.

The crucial thing is that it's purely the number of imports, not the total amount of code (input or generated). The compiler crashed even if I removed all code from several files. Conversely, I amalgamated the code from a bunch of Spine files and it's now compiling perfectly happily.

Thanks for the tips RE transcc gc and debugging. It's not something I can squeeze in right now but I'll look into it when I get the chance.


rIKmAN(Posted 2015) [#8]
Interesting!

Any idea on the number of Imports it took to cause the crash?


Peeling(Posted 2015) [#9]
I'll figure that out when I get the chance to dig deeper.


Peeling(Posted 2015) [#10]
Update: Seems like it was a memory problem after all. I don't know why moving code around and reducing the number of imports made a difference, but it did. I've recompiled with on-the-fly GC and it works fine now.

As an aside, is there any scope for speeding up the "Semanting..." phase?


ImmutableOctet(SKNG)(Posted 2015) [#11]
Speeding it up? I mean, threads are an option, but that's Monkey 2 territory. You could try upping the priority of the process, but that's about it. What you're talking about is the phase responsible for evaluation of your source code. The most I can recommend is looking at what you're reflecting. There's also generics, which slows this down, but even with CRT patterns, this isn't a major time-hit. Honestly, transcc is pretty fast as it is, even if it could be faster. The real bottleneck tends to be the native compiler. I mean, have you tried the HTML5 target? That builds really quickly, even for large projects.


Peeling(Posted 2015) [#12]
AFAIK the html target handles tinting of images poorly (or it was doing last time I checked), so it's impractical to run it that way.

The 'semanting' phase is currently taking substantially longer than the native build phase. I haven't touched reflection, so that I don't know about. We do use a lot of generics.


Peeling(Posted 2015) [#13]
Actual figures for the above:

Semanting takes 49 seconds
Translating to game window appearing on screen takes 28 seconds.


Peeling(Posted 2015) [#14]
As a test I manually combined 84 of the source files from our custom UI module into just two. That brought the 'Semanting' time of the project down from 49 seconds to 37 (a saving of a quarter!).

The project itself has around 130 source files. Our UI module has 187. Diddy has 33, Mojo about 20, brl 30, and there are a few others, so around 400 all told.

84 is around a fifth of that total, so there's a reasonable correlation between the number of files being imported and the Semanting time, given the same total lines of code. I'm compiling on a solid state drive, and manipulating that amount of files is next to instantaneous, so there's definitely some significant compiler overhead in the importing process, over and above the amount of time it takes to read the code once it's loaded.


skid(Posted 2015) [#15]
What platform?

According to Microsoft the C run-time libraries have a 512 limit for the number of files that can be open at any one time.

I would add handle count to the control panel column to monitor trans and check it is closing it's files correctly.


Peeling(Posted 2015) [#16]
I don't think it's a hard limit thing. I amalgamated 62 files into 1 and it saved 8 seconds, and then amalgamated another 22 into 1 and saved another 4 seconds.


Peeling(Posted 2015) [#17]
Further investigation shows that essentially all the time is being spent in app.semant. Disabling reflection (to be certain I rebuilt the compiler with the reflection check commented out) makes no difference at all, and there is no disc activity (presumably that all happened during Parsing).

I added some crude stat tracking to the compiler and found what I think must be the smoking gun.

With the project as-is, I get these stats from ScopeDecl:

GetDecl_Success: 169091
GetDecl_Fail: 12724495
FindDecl_Count: 112098
FindModuleDecl_Count: 1801

With 62 files combined into one, I get these stats:

GetDecl_Success: 169091
GetDecl_Fail: 10818843
FindDecl_Count: 112098
FindModuleDecl_Count: 1801

With fewer files, the number of failed GetDecl attempts drops by 15%. The "Semanting..." time (now on my slightly beefier personal laptop) drops from 45 seconds to 38 - almost exactly 14.5% quicker. That's pretty interesting, as it suggests that the speed of the semanting process is governed almost entirely by how many files (and hence scopes) a project is broken down into.

I'm going to look into ways of short-cutting the search process. I'll keep you posted.


ImmutableOctet(SKNG)(Posted 2015) [#18]
The only things I can think of are Mark's heavy usage of dynamic casts, and poor paging performance / superfetch performance. Considering you're running an SSD, I doubt it's much of a disk bottleneck. And though dynamic casts are bad on many levels, a -O3 release build with GCC/MinGW, coupled with modern hardware shouldn't perform terribly.

Regardless of the bottleneck, the fact is, Monkey's build system is rather problematic. It generates a single native source file. And since compilers like g++ can't optimize builds based on individual changes, and instead can only do this based on per-file changes, you have to compile everything but the native frameworks every time. So, because of this, and a number of other problems, Mark's making Monkey 2.

Following the logic that Mark wants to build a better compiler, I can only assume this includes the semantic phase. As far as the compiler's design goes, it's a mess. The fact that dynamic casts are used anywhere other than corner cases is absurd. It's just not good practice at all. Other than being a specialization nightmare, they're similar to virtual calls, where the big performance hits are caused by cache misses. But unlike virtual calls, modern CPUs aren't anywhere near as good at handling dynamic casts as they are branch prediction. x64 CPUs today are good enough at dealing with this kind of thing, but it wouldn't surprise me if the abundant dynamic casts had a part in the performance problems. Especially when it's scaled up to something like this.

What I think needs to be understood is that Monkey's compiler is fast, but it's not exactly efficient, and it has some big design problems. Because of this, it doesn't scale as well as it should. Unfortunately, to speed up the problems I mentioned, you'd have to overhaul a lot of it. This includes the other passes.

The only other thing I can say is to try configuring the garbage collector further (Max locals, etc). Or, you know, wait for Monkey 2, but that's a while out. I guess theoretically an easier way to chrunch that last bit of performance would be to make the translator output into separate files, then use g++'s smart building functionality. But that's just for the sake of bringing the time down.


Peeling(Posted 2015) [#19]
It's definitely not a disk bottleneck, because there's no disc access during the semanting phase. It actually runs quicker on my personal laptop (with a slightly faster CPU) building the project from my secondary non-SSD.

Based on my tests, I think the most severe bottleneck is far more straightforward than the issues you raise: the time spent compiling a given quantity of code scales almost linearly with the number of files that code is spread over. Looking at the declaration search process, it's pretty clear why this is the case, too. I'm working on an optimisation now.


Peeling(Posted 2015) [#20]
Ok, I've implemented the following optimisation to ModuleDecl.GetDecl:

If the ModuleDecl is dirty (is new or has had decls added since it was last queried - thanks 'reflection' for that particular head-scratcher), it proceeds as follows:

First, create a Public Access List of Modules by walking the public import lists in the manner of the original GetDecl method.
Then create a Private Access List by cloning the above and walking any other Modules privately imported by this ModuleDecl.

It then scrapes the declmap of each module in these lists to create public and private Indirect Ident Maps. An Indirect Ident Map (IIM) is a stringmap of idents to SynonymLists. A SynonymList is a list of decl/moduledecl pairs that all match the same ident.

The end product are two maps of every synonym of every decl reachable from, or via an import of, this module, as well as the aforementioned accessibility lists, which will also come in handy.

Once this is done, or if the ModuleDecl is 'clean', it proceeds as follows:

If the _env.ModuleScope MATCHES this ModuleScope, it grabs the SynonymList from the Private Indirect Ident Map, and picks the first decl that satisfies the prevailing accessibility criteria (copied from the original GetDecl method). You still get an error message if more than one match is found, just like before.

If the _env.ModuleScope does NOT MATCH this ModuleScope, but the _env.ModuleScope IS in this module's public access list, it checks the _env.ModuleScope's Private IIM and this module's public IIM, which as far as I can work out duplicates the behaviour of the original code.

If the _env.ModuleScope does NOT MATCH this ModuleScope, and ISN'T in this module's public access list either, it just checks this module's public IIM.

I did some test runs on our project in which both the old and new GetDecl were called and the results compared, and all 169,000 calls match.

Performance-wise, it now semants our project in under 4 seconds, down from 49 :)

There's probably a bunch more fat to be cut (given the huge overlap in the accessibility lists of different modules), but 4 seconds is not enough for me to spend any more time on.

If anyone's interested in the changes (purely to decl.monkey), let me know.

EDIT: My implementation requires a 'clone' method to be added to Map, so there's that.


ImmutableOctet(SKNG)(Posted 2015) [#21]
I'm actually really interested in your implementation, and if it's stable, you should make a pull request on GitHub. Or at least fork Monkey publically from there. If you're getting this big of a performance boost, then it's definitely worth adopting.


Peeling(Posted 2015) [#22]
EDIT: Don't use this yet!

I was mulling it over on the way home and realised I'd cocked something up. Inserting new decls into modules has to invalidate all cached data, not just the cache of the affected module. Kind of surprised that didn't show up in the soak test, actually. Anyway, I'll fix it tonight at some point.

====================================================

Would you believe I've never had to use Github before? Think I got my head around it eventually:

https://github.com/JocelynSachs/monkey

That's got the Map change and the Decl change. Unfortunately I wasn't able to test them out in that version as I'm on the clock and we aren't using that version of the modules for our project. I can only apologise if you find they don't compile :)


Peeling(Posted 2015) [#23]
Ok, brain fart fixed. Still haven't had a chance to compile those changes against the latest modules; sorry. Feel free to try it out yourself, or else I'll try to find the time soon.


Peeling(Posted 2015) [#24]
Compiler errors now fixed courtesy of Anthony D. Much obliged :)


ImmutableOctet(SKNG)(Posted 2015) [#25]
@Peeling: I've been using this for a while without any problems, any thoughts on making a pull request?


Peeling(Posted 2015) [#26]
I've submitted a pull request yesterday.