Crash on mac and windows in memcpy with GC?
BlitzMax Forums/BlitzMax Programming/Crash on mac and windows in memcpy with GC?
| ||
There's been a gremlin in my code for a long time that I can't ignore any longer. When a memcpy occurs (most notable when converting a pixmap's format) sometimes I get a crash. This seems to only happen with multithreaded compile, as before I moved my project to MT I didn't have this problem and nothing related to the texture creation is changed. However it doesn't just happen with threads, the crash will happen on the primary thread as well. Here's one snipet of an example crash log Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libSystem.B.dylib 0xffff1250 __longcopy + 80 1 libSystem.B.dylib 0xffff0876 __memcpy + 214 2 libGLImage.dylib 0x93dcfc93 glgProcessPixelsWithProcessor + 725 3 GLEngine 0x1368cd0a gleTextureImagePut + 1433 4 GLEngine 0x1368a490 glTexImage2D_Exec + 1427 5 libGL.dylib 0x914c245f glTexImage2D + 87 ... That crash occured while a texture was being generated from a pixmap. Similar crashes will occur when converting a pixmap format. It seems largely connected to the garbage collector, as if I put a GCCollect right before the copy it tends to crash more frequently. Additionally when I wrap the function that the copy will happen in with GCSuspend and GCResume it tends to happen less... but doesn't stop completely (perhaps a collect is already running when the suspend is called which doesn't get interrupted?). I tried turning the garbage collector to manual but then I started getting hanging... I'm rather confused and am pretty much out of ideas. Any thoughts or suggestions? This also seems to happen the most with pixmaps I get from brucy's freeimage mod but I can't confirm that it's just those pixmaps (and once they're in bmax pixmaps it shouldn't matter the source any way...) |
| ||
are you doing memcopy in the main thread? (just an idea, I dont know if it really affects anything) |
| ||
Doesn't matter where it happens, main or child. That crash above is specifically in the main (I figured it would be easier to manage things if it's in the main) |
| ||
I would revert the project to single-threaded if possible... It seems like a rare MT bug. It would be best if you could post simplified code that reproduces it. |
| ||
working on finding the time to punch up a simplified example, but haven't found it yet, especially difficult since it's not an every time type bug, but a when the stars align and therefore the memory doesn't... doing more testing on the PC I can confirm its exactly the sample crash, specifically it's in ConvertPixelsToStdFormat in ConvertPixels in Convert on a pixmap. Interestingly running it with the new 1.40 release with the MT debugger on mac when it crashes I get an array out of bounds exception. Combined with where it crashes (pixel.bmx, line 107) it appears to confirm my suspicion that under some circumstances the garbage collector (or something) will shift a memory block while it's being copied, this in turn puts the array out of whack and boom, crash. Once again this happens on the primary as well as child threads on a MT app. I can't revert the project to single threaded as there are some things that just aren't practical in a single thread and they're critical to my program (specifically background loading of pictures which can take a long time for a single large picture and I need to churn through LOTS while doing other things...) I still suspect the garbage collector since it's the most likely thing to be causing a block of memory to get shuffled about... I will try to punch up a simplified example and post in the bug reports but until then if anyone has any ideas I'd love to give them a shot... |
| ||
I would also suspect the GC... is it also possible that some gfx memory is being GC'd causing the GL memcopy to crash on occasion? I vaguely remember there were some issues with the GC and OGL in places... can't remember if they were fixed - or if a particular fix has a knock-on effect. |
| ||
I think the OGL connection is likely just random, as I will get the same crash with a strait TPixmap conversion or copy. It just happens to be copying the memory in the posted crash to opengl rather than to another pixmap. That said I would be interested in the GC/OGL connection as perhaps there's something that can be gleaned related to this... |
| ||
Here's a little sample, it's not exactly the same crash I'm seeing, but I think it's probably the same root cause... This is crashing on my mac as soon as I launch it.SuperStrict Function ConvertPicture:Object(in:Object) ' function to be spawned in a child thread Local pixm:TPixmap = LoadPixmap("sample.jpg") ' load a pixmap, the larger the picture the better If(pixm.format = PF_RGBA8888) Then Print "already PF_RGBA8888" Local anotherpixm:tpixmap = pixm.Convert(PF_RGBA8888) ' do the format conversion, crash could happen in here... Local yetanotherpixm:TPixmap = anotherpixm.copy() ' do a copy, this could also crash. This uses up more memory for yet more cleanup Return yetanotherpixm ' return value to let it be stored in ram for a bit End Function Local onConversion:Int = 1 Local convertThread:TThread = CreateThread(ConvertPicture, Null) Print "Starting first conversion" ' loop until escape is pressed While(True) ' repeat forever Local aPixm:object = ConvertPicture(Null) ' do a copy on the main thread as well for some memory retention and more ram thrashing Print GCMemAlloced() + " collected " + GCCollect() ' thrash the garbage collector to try to provoke a crash If(Not ThreadRunning(convertThread)) ' if the thread is done convertThread = CreateThread(ConvertPicture, Null) ' start it again onConversion:+1 Print "conversion " + onConversion End if Wend specifically it crashes when the main thread goes to load the picture as well, without that it ran for a while without incident, but I will comment and let it run longer to see if I can get the exact same crash. |
| ||
Had a power failure which set back the testing a bit. But after recovering if I try to run with the main thread convertpicture and GCCollect calls removed it crashes right away in debug mode... the main thread is doing a GCResume for some reason and the child is creating a new pixmap... however in non debug it seems to run just fine... still very confusing update: if you call GCCollect too fast it seems like a mutex that blocks simultaneous GCCollect calls gets stuck and the app will just idle out... definitely something wacky going on with the garbage collector in MT |
| ||
With the debugger enabled I get a recursive GC collect that seems to lock up the memory system. Doesn't happen without debug on... there's definitely some issues with the MT garbage collector. |
| ||
I've opened a bug report thread at http://www.blitzbasic.com/Community/posts.php?topic=91117 in the hopes of getting some exposure to someone more intimately aware of the threading and GC systems as they're turning into quite a rats nest as I dig in from my perspective at least. Still desperate for any ideas or suggestions of things to try. Also curious can anyone else reproduce crashing or hanging on the sample in debug or regular mode? At this point I just want to know if I've gone totally insane or just partially. |
| ||
I'm successfully using MT in my applications and may be able to help The sample you provided, to me, seems odly formed and not a very good real-world example. For example, your "thread" is continually called like a function and doesn't really provide a big advantage in using it this way. I also find it odd that both your thread and main thread are constantly calling the same block of code - again, not a very good real world scenario. I'd be interested in seeing a better example that more colsely resembles what is happening in your real application On a side note, I have noticed some odd crashes with MT in cases where the existance of the thread was very short, or the life of a locked mutex was extremely short. Maybe try putting a small delay of 20ms or so at the end of the thread function and see if it improves. |
| ||
The example is merely to demonstrate that there's an underlying problem, not to illustrate my usage. the reason the same block is called from the thread and the main thread is simply to abuse the memory faster and I didn't want to write 2 functions. I've done of a lot of playing with the example as well (such as putting the load outside the child thread and just doing converts, or making the child thread just loop converts forever so it's not constantly being relaunched, removing the main thread function call, etc.) sometimes things work, and then I'll run the same example with debug on and it will crash. Also if you move around the GCCollect call you will get different results. There's a fundamental problem since various more/less appropriate applications of multithreading will cause it. Relating delay, I can get crashes when the main and child threads are running both for extended periods. However under some circumstances I can create a hang when 2 things appear to be racing to free at the same time, this would be related I believe to the garbage collector calling an application lock, perhaps when the application is already busy locking for a free... This is why I started the support thread, there's a lot of locking of various things in the core of the GC and it's all tangled up, and on top of that I think there's a problem like you mentioned with locking/unlocking too fast. The real world scenario (haven't made a simplified example yet as it's VERY embeded in my programs flow) is a display starts, and a child thread is spawned to load pictures for use in the display (using freeimage to be precise so no it's not related to the graphics system only being accessable from the main thread). Sometimes everything works flawlessly. Sometimes It will crash right away, some times it will crash after processing 50 pictures, etc. It's very random... Thank you for the feedback, I'll try peppering some things with delays and see if that has any effect. |
| ||
I'm at work right now, but now that I think about it, I also have a pice of code which also involves some pixmap manipulation that I can get to run great, as well as crash randomly depending on where I lock and unlock a Mutex. I'll look at that piece of code tonight when I get home and see if we have some similarities |
| ||
I will be in your debut just for looking Jon, I've got a serious case of the crazys from this and it's pretty vital I get it sorted out... Here's a process sample from when I can get what I suspect is the double lock. I sent a different one to Brucy the other day to have a look at, and I believe there are some differences between the 2 (which again would imply that randomly too many/too fast locks = problems) Call graph: 2435 Thread_100498 DispatchQueue_1: com.apple.main-thread (serial) 2435 start 2435 _start 2435 main 2435 -[NSApplication run] 2435 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 2435 _DPSNextEvent 2435 AEProcessAppleEvent 2435 aeProcessAppleEvent 2435 dispatchEventAndSendReply(AEDesc const*, AEDesc*) 2435 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*) 2435 _NSAppleEventManagerGenericHandler 2435 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] 2435 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] 2435 -[NSApplication(NSAppleEventHandling) _handleAEOpen:] 2435 -[NSApplication _sendFinishLaunchingNotification] 2435 -[NSApplication _postDidFinishNotification] 2435 -[NSNotificationCenter postNotificationName:object:] 2435 -[NSNotificationCenter postNotificationName:object:userInfo:] 2435 _CFXNotificationPostNotification 2435 __CFXNotificationPost 2435 _nsnote_callback 2435 run 2435 4 2435 415 2435 802 2435 639 2435 132 2435 54 2435 _brl_system_TMacOSSystemDriver_Poll 2435 updateEvents 2435 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 2435 _DPSNextEvent 2435 BlockUntilNextEventMatchingListInMode 2435 ReceiveNextEventCommon 2435 RunCurrentEventLoopInMode 2435 CFRunLoopRunInMode 2435 CFRunLoopRunSpecific 2435 __CFRunLoopRun 2435 __CFRunLoopDoObservers 2435 CFQSortArray 2435 CFSortIndexes 2435 malloc_zone_memalign 2435 szone_memalign 2435 szone_malloc_should_clear 2435 tiny_malloc_from_free_list 2435 tiny_free_list_add_ptr 2435 _sigtramp 2435 semaphore_wait_trap 2435 Thread_100499 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2435 start_wqthread 2435 _pthread_wqthread 2435 _dispatch_worker_thread2 2435 _dispatch_queue_invoke 2435 _dispatch_mgr_invoke 2435 kevent 2435 Thread_100503 2435 thread_start 2435 _pthread_start 2435 threadProc 2435 _brl_threads_TThread__EntryStub 2435 bb_ThreadedPrepareElements 2435 191 2435 532 2435 141 2435 bbGCCollect 2435 collectMem 2435 343 2435 842 2435 bmx_freeimage_delete 2435 free 2435 __spin_lock Thread 1 seems to be handling the event que, and locking and freeing junk as a result of mucking about. Thread 2 you always get in threaded apps, it seems to be the thread manager as best as I can tell... Thread 3 is my child child thread (note, just 1 child thread at this point) trying to do cleanup after it's done with a freeimage, the freeimage is in it's delete method, which calls free on it's allocated memory block, that's halting (I assume) to wait for the main thread to get done freeing things... which it won't because (again I assume) it's been confused by the child thread trying to free things. And yet again, just for the record, this is just one manifestation in one program. |
| ||
I literally COVERED the suspected problem areas with Delay(20)'s and it seems to not hang (usual disclaimer with randomish crashes etc.)... I think you're very much on to something with the high speed lock/unlock causing problems, and that feeds back to my theory that the GC problem could actually be a thread control issue (i.e. the threads locking/unlocking)... Hope! there is hope! |
| ||
It was the same case in a project of mine. I purposely had to make my Lock/Unlock take longer than it should. If I remember right, here is what I did: (pseudo)lockMutex(imageMutex) thisPixmap=GetAPixmap()'external function unlockMutex(imageMutex) thisImage=LockPixmap(thisPixmap) 'The above code would randomly crash from 30 seconds to 2 minutes into running Then, to force the time between LockMutex and UnlockMutex to be longer, I simply kept the mutex locked until thisImage was created... lockMutex(imageMutex) thisPixmap=GetAPixmap()'external function thisImage=LockPixmap(thisPixmap) unlockMutex(imageMutex) 'This time, the above code works crash-free (and I've even let it run overnight) 'and the only difference is the location of UnlockMutex Anyways, the above example is how I got my code to run absolutely crash free |
| ||
Thanks! I'm so far so good with a delay 20 added before a manual gccollect() call added after resuming the garbage collector (I had problems with the collector running while doing Some of the copys sometimes specifically in child threads. I think this also is prevent too many lock/unlock cycles on some mutexes... I'll need more poking and testing to verify but this is the first positive progress I've seen on this problem in a long time so I'm quite optimistic! |
| ||
another sampleSuperStrict Global theMutex:TMutex = CreateMutex() Global counter:Int = 0 Function tfunc:Object(in:Object) While(True) LockMutex(theMutex) counter:+1 Local pixm:TPixmap = CreatePixmap(2048, 2048, PF_RGBA8888) UnlockMutex(theMutex) Wend End Function CreateThread(tfunc, Null) Print "starting" While(True) LockMutex(theMutex) counter:+1 UnlockMutex(theMutex) If(counter >= 10000000) Print MilliSecs() counter = 0 End If Wend tossed that up on my PC while trying some stuff, it crashes right away on the create pixmap in the child thread with an access violation while trying to alloc the memory. |
| ||
Compiled on Linux, your example above also crashes with a segmentation fault.. But to further prove a point, add a simple delay in the thread and presto!SuperStrict Global theMutex:TMutex = CreateMutex() Global counter:Int = 0 Function tfunc:Object(in:Object) While(True) LockMutex(theMutex) counter=counter+1 Local pixm:TPixmap = CreatePixmap(2048, 2048, PF_RGBA8888) UnlockMutex(theMutex) Delay(100) Wend End Function CreateThread(tfunc, Null) Print "starting" While(True) LockMutex(theMutex) counter=counter+1 UnlockMutex(theMutex) If(counter >= 10000000) Print MilliSecs() counter = 0 End If Wend |
| ||
I'm having great success with a bunch of delays peppered around. No more hangs and no crashes, however it does cause the application to leak like a sieve... it did this some other times when messing around with auto vs/manual GC... I'm not sure where it comes from but it's related as the memory is totally fine without delays but it will either crash or hang sooner or later. With delays no crash or hang but it will leak and leak until it chokes... At this point I'll take the leaks over the crashing but still something to get worked out... Still grinding |
| ||
I have no problems with Auto GC with my threaded applications. I remember that you mentioned that you modified the CG code and now run it manually. You may find now that you have injected some delays in your thread, that if you restore the original GC code, it may work just fine for you and git rid of your memory leak |
| ||
I restored the GC code before starting with the delays (on the theory that by that point I'm sure I'd broken something). I've noticed the leaking in the past under certain circumstances. I think I may try modifying the GC again to see if that cleans up some of the leaking. |
| ||
You aren't by chance using MaxGUI in your thread, are you? I only mention this because you could create a memory leak by not calling FreeGadget()... |
| ||
MaxGUI is used earlier in my program, but not in any child threads, and is totally shut down by the time I get to the part that runs for a while and leaks. I'm going to look back over my code and see if I can narrow down what object(s) are leaking, maybe there's a free that's getting missed somewhere due to my structure. |
| ||
Hi, I've found one issue to do with allocating lots of large un-GCed memory - eg: the way pixmap does. Can you give this a try - it at least fixes the above! http://www.blitzbasic.com/tmp/blitz.mod.zip Replace your existing mod/brl.mod/blitz.mod folder with this 'un. |
| ||
I've been making lots of workarounds, I'll pull as many out as I can and give this a go right now. Thanks mark! |
| ||
So far so good on mac an PC. I am noticing the occasional slight delay (half a second or so) sometimes right about when I would expect a large free to be happening (such right about when I would expect my program to release all contact with a large pixmap), is this likely to be a result of the new changes or just my imagination? It's not a deal breaker (I mean I am dealing with LARGE chunks of memory so I should expect some things take a little time), just curious if that's a sign of the new code kicking in. |
| ||
Seems better than before, however it will still crash or hang if 2 allocs happen at the same time, and possibly one triggers the collector... Related: I've been toying with turning off the auto collector so I can control when the collects happen (so I know an alloc isn't taking place). Whenever allocs will happen I lock a mutex, I then call GCCollect() whenever the mutex isn't locked in my main loop. This seems to work from a stability standpoint (as long as I don't miss any allocs with my mutex lock) but it creates a pause that grows in duration (especially on PC, but mac as well) the longer my program runs. I further set it so it only ran a GCCollect() once per second in the main loop, if the mutex wasn't locked, and it was perfectly smooth on the PC to start, I came back about 20 minutes later and there was about a 1/4 second pause once per second... [Update] Here's a sample of my application locking up due to 2 allocs at the same time... Main thread is trying to alloc an object, which triggers a GCCollect, which tries to alloc an object in the collection process, and end in a spin lock. Thread 2 is trying to alloc an object which causes the GC to try to lock the collector mutex and waits. Call graph: 2367 Thread_179469 DispatchQueue_1: com.apple.main-thread (serial) 2367 start 2367 _start 2367 main 2367 -[NSApplication run] 2367 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 2367 _DPSNextEvent 2367 AEProcessAppleEvent 2367 aeProcessAppleEvent 2367 dispatchEventAndSendReply(AEDesc const*, AEDesc*) 2367 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*) 2367 _NSAppleEventManagerGenericHandler 2367 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] 2367 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] 2367 -[NSApplication(NSAppleEventHandling) _handleAEOpen:] 2367 -[NSApplication _sendFinishLaunchingNotification] 2367 -[NSApplication _postDidFinishNotification] 2367 -[NSNotificationCenter postNotificationName:object:] 2367 -[NSNotificationCenter postNotificationName:object:userInfo:] 2367 _CFXNotificationPostNotification 2367 __CFXNotificationPost 2367 _nsnote_callback 2367 run 2367 4 2367 1322 2367 2422 2367 77 2367 666 2367 809 2367 278 2367 _sidesign_minib3d_TEntity_MoveEntity 2367 bbObjectNew 2367 bbGCAllocObject 2367 allocMem 2367 collectMem 2367 353 2367 876 2367 _bah_freeimage_TBPHolder_Create 2367 bbObjectNew 2367 bbGCAllocObject 2367 __spin_lock 2367 Thread_179470 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2367 start_wqthread 2367 _pthread_wqthread 2367 _dispatch_worker_thread2 2367 _dispatch_queue_invoke 2367 _dispatch_mgr_invoke 2367 kevent 2367 Thread_179481 2367 thread_start 2367 _pthread_start 2367 threadProc 2367 _brl_threads_TThread__EntryStub 2367 bb_ThreadedPrepareElements 2367 190 2367 539 2367 brl_filesystem_StripDir 2367 bbStringSlice 2367 bbStringNew 2367 bbGCAllocObject 2367 pthread_mutex_lock 2367 new_sem_from_pool 2367 _sigtramp 2367 semaphore_wait_trap |
| ||
I'm a bit confused by this now... seems to be the last lingering problem with my current structure. The garbage collector is in mode 2 (manual). The main thread has locked a mutex through TryLockMutex() that controls if the garbage collector is allowed to be called. Since it succeeded, it calls GCCollect() (translates to bbGCCollect) and that calls collectmem, then something, then it calls pthread_detach, which calls pthread_join, and then a spin lock... The child thread is waiting for the garbage collector mutex to unlock so it can continue with it's task. and seems to be waiting patiently like it should... What's up with the detach and joins? Call graph: 2315 Thread_323551 DispatchQueue_1: com.apple.main-thread (serial) 2315 start 2315 _start 2315 main 2315 -[NSApplication run] 2315 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 2315 _DPSNextEvent 2315 AEProcessAppleEvent 2315 aeProcessAppleEvent 2315 dispatchEventAndSendReply(AEDesc const*, AEDesc*) 2315 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned long, unsigned char*) 2315 _NSAppleEventManagerGenericHandler 2315 -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] 2315 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] 2315 -[NSApplication(NSAppleEventHandling) _handleAEOpen:] 2315 -[NSApplication _sendFinishLaunchingNotification] 2315 -[NSApplication _postDidFinishNotification] 2315 -[NSNotificationCenter postNotificationName:object:] 2315 -[NSNotificationCenter postNotificationName:object:userInfo:] 2315 _CFXNotificationPostNotification 2315 __CFXNotificationPost 2315 _nsnote_callback 2315 run 2315 4 2315 1322 2315 2422 2315 77 2315 bbGCCollect 2315 collectMem 2315 244 2315 pthread_detach 2315 pthread_join$NOCANCEL$UNIX2003 2315 __spin_lock 2315 Thread_323552 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2315 start_wqthread 2315 _pthread_wqthread 2315 _dispatch_worker_thread2 2315 _dispatch_queue_invoke 2315 _dispatch_mgr_invoke 2315 kevent 2315 Thread_323610 2315 thread_start 2315 _pthread_start 2315 threadProc 2315 _brl_threads_TThread__EntryStub 2315 bb_ThreadedPrepareElements 2315 183 2315 549 2315 _bb_TElement_init 2315 135 2315 brl_threads_LockMutex 2315 _brl_threads_TMutex_Lock 2315 pthread_mutex_lock 2315 new_sem_from_pool 2315 _sigtramp 2315 semaphore_wait_trap |
| ||
Hi, Unless you post some more runnable code, I'm afraid there's not much I can do - stack traces aren't particularly useful in these cases, as with threading the problem may have already occured long before the crash. Have you tried running the app with plain old auto-GC enabled? There's a chance that if you've disabled GC and the app needs to allocate memory and can't it'll just fail and BANG - esp. with large allocations as I suspect your app is using. |
| ||
Auto GC causes many many more crashes as it will fire when something is allocating quite often and then it dies. The reason I've switched back to manual GC is I can control when the collect happens, and therefore be sure than no child threads are busy allocating anything (through the use of a mutex). I'm still working on trying to punch up an example, but without much success, as even in my sprawling project it doesn't happen reliably so it's very hard to narrow down what/where/when/how/why something is going wrong. The only commonality I notice (as illustrated by the traces) is that problems are always within an alloc or free, and are much much much more prevalent if memory is being handled in 2 places at once (such as an alloc in the main and child threads at the same time). I was experiencing some problems with semaphores a while ago as well which caused me to abandon them as a means of restricting simultanious access, I'll see if I can re-create that problem with some sample code as perhaps that will be easier than my current flow. I don't think there's an allocation space issue, as if I dissabled the collector all together (just to see) it will run up to around 1gb alloced before anything bad starts to happen, where as it is usually running around 60-260mb with manual collection, and if I put it on auto it will spike up to about 400 before collecting sometimes. So there should be plenty of overhead, I tend to collect roughly every 10th of a second (assuming there's nothing blocking the collect) so the pool never rises, it will collect after every large alloc/free (not guaranteed due to timing but it should never pass 2 large alloc/free's), and it runs in a loop with the same content, usually for hours (6+) without any problems, and sometimes it will choke and die within minutes. Will try to get more sample code for you, just particularly curious what the "2315 pthread_join$NOCANCEL$UNIX2003" trace meant, and also why it's detaching/joining in the collect cycle. |
| ||
I am also still having problems in my threaded app that also deals with pixmaps. It will randomly hang (not a full crash per se). I have tried the modified blitz.mod posted by Mark, but I'm still having problems. |
| ||
Here's an interesting dump I got from a tester. Still no code I know, still working on that...Call graph: 2882 Thread_1175 DispatchQueue_1: com.apple.main-thread (serial) 2882 start 2882 main 2882 launchd_runtime 2882 mach_msg 2882 mach_msg_trap 2882 Thread_1176 2882 thread_start 2882 _pthread_start 2882 kqueue_demand_loop 2882 select$DARWIN_EXTSN Total number in stack (recursive counted multiple, when >=5): Sort by top of stack, same collapsed (when >= 5): mach_msg_trap 2882 select$DARWIN_EXTSN 2882 Sample analysis of process 217 written to file /dev/stdout This time thread 1 (not my thread, the one bmax runs I assume to trap events) seems to have found something more interesting to occupy it's time.... Will keep trying to get a good example of some form of this hanging/crashing. It keeps manifesting in such different ways it's quite annoying. |
| ||
Just a follow up: Now running BMX v1.41 My MT code is now rock solid - but not all due to BMX 1.41. In my case, it came back to the fact that OGL isn't 100% thread safe. My random crashes appear to have came from the fact that I was Locking/UnLocking mutexes around Max2D commands (mainly DrawImage, which turned out to be the biggest culprit). My before code (pseudo) that would crash: (Notice that I'm locking a mutex around an external c function, and around drawImage) (Note that the Update() method happens in its own thread, and the Draw() method happens in the main thread) Type TWebCam Field image:TImage Field pixmap:TPixmap ... ... Method Update() LockMutex(pixmapMutex) Self.pixmap.pixels=grab_frame() 'grab_frame is an external c function UnlockMutex(pixmapMutex) LockMutex(imageMutex) Self.image=LoadImage(Self.pixmap) UnlockMutex(imageMutex) End Method Method Draw(x:Int,y:Int) LockMutex(imageMutex) DrawImage(Self.Image,x,y) UnlockMutex(ImageMutex) End Method End Type AFTER: Since the webcam image is returned as a pixmap, and I only need a TImage when its drawn, I make one on the fly in my draw method. Also notice that I no longer lock a mutex around the external c function, or the Max2D DrawImage() function... Type TWebCam Field pixmap:TPixmap ... ... Method Update() Local grabbedPixmap:TPixmap=CreatePixmap(640,480) grabbedPixmap.pixels=grab_frame() 'grab_frame is an external c function LockMutex(pixmapMutex) Self.pixMap=grabbedPixmap UnlockMutex(pixmapMutex) End Method Method Draw(x:Int,y:Int) Local thisImage:TImage LockMutex(pixmapMutex) thisImage=LoadImage(Self.pixmap) UnlockMutex(pixmapMutex) DrawImage(thisImage,x,y) End Method End Type These simple changes have made my application 100% stable. Ima747: Look for similar things in your MT code, and find way around Locking/Unlocking mutexes around Max2D functions and external c functions. Then you will either fix your problem, or eliminate the possibility that something that you are threading isn't really thread safe... |