Thread-Pool multithreading problem

BlitzMax Forums/BlitzMax Programming/Thread-Pool multithreading problem

ImaginaryHuman(Posted 2014) [#1]
So I'm making this little thread pool thing so that I can start throwing jobs at it. Strangely it seems like it works ok when built for Release, but when in Debug, something seems to go wrong and it throws an error. Now I know Release mode can do things invisibly that Debug will identify as an error, but the program actually behaves differently in Debug mode.

In Release mode, the thread pool is initialized and all of the threads are Pause()'d awaiting work. This creates a semaphore for each and then waits for the semaphore signal that there's work to do. This works fine in Release mode. In Debug mode it seems like the semaphore is being ignored completely. By putting a simple `Print "here"` immediately after the WaitSemaphore() in the Pause() method, even with absolutely no jobs assigned to the queue, it seems like the semaphore is totally ignored and the thread just flows right past it and prints 'here' when it shouldn't. Again, in Release mode this does not happen and the semaphores are respected.

Is the debugger supposed to work properly with threads or is this a crapshoot and do I have to resort to just relying on printing to the console to try to debug things? I was reading a Blitz3D thread that suggested the debugger may not work properly with threads? Or could it be a bug or something? Or have I misunderstood how to use semaphores? Or maybe the presence of the debugger causes something to happen outside of the LockMutexes within some kind of debugger code?

Once the semaphore is ignored, obviously the program is not built to deal with a failed signaling system, so it results in an index array-bounds error because something's happening that shouldn't be.

I'm in BlitzMax 1.49 - Here is the code ... I use self-optimizing arrays (highest element is moved to fill the gap left by a lower element), to manage the queue of a) what threads are available, b) what jobs are free to be defined, c) what jobs have been assigned and are waiting to be claimed by a thread, etc.




Derron(Posted 2014) [#2]
Just compiled it on linux

had to
Import "-ldl"

as I did not modify my appstub-file.

In both - release and debug - it printed unordered numbers, no crashes.


Compiled on my virtualXP with 2 cores assigned:
Release and debug compilation works without throwing errors.


bye
Ron


ImaginaryHuman(Posted 2014) [#3]
Hmm. Thanks for doing that, though I'm a bit miffed why your debug version works and mine doesn't.

Hm... seems to run in debug okay on Windows too... but not OSX.


ImaginaryHuman(Posted 2014) [#4]
Also I've found various other things work ok in release but not debug. e.g. otus.lzma module, bah.freeimage module, etc.. various errors. I guess I might have to go back to the drawing board and reinstall blitz or something.


Derron(Posted 2014) [#5]
For bah.image you could ask Brucey for help, he is on a Mac too. Maybe you have a collision of the included libraries.


bye
Ron


ImaginaryHuman(Posted 2014) [#6]
hmm, possibly.. I will check that, thanks.


ImaginaryHuman(Posted 2014) [#7]
Yah its not a module conflict. Everything builds fine in release mode without multithreading. it's when multithreading is switched on that it throws tonnes of errors.


Derron(Posted 2014) [#8]
maybe you are able to post some of these errors here?

Did you search the data squid of your choice for it?


bye
Ron


ImaginaryHuman(Posted 2014) [#9]
On windows it builds multithreaded debug ok and runs correctly. But on OSX when it gets to the WaitSemaphore(), with absolutely nothing posting any semaphores program flow just continues right past it, and so if there is a print statement after it such as Print "here", in the console it will print `here` when it shouldn't. The semaphore is ignored. There is no real error message from that, as such.... it's not doing what the command is supposed to do.


Brucey(Posted 2014) [#10]
When I try to run it, I get the error :
Unhandled Exception:Attempt to index array element beyond array length

... on line 162 :
		ThreadPoolAvailable[ThreadPoolAvailableThreads]=Index	'Mark this thread as paused and available for work

However, if I put a Debuglog in before the LockMutex() call, it's fine, and seems to run as expected in Debug.


ImaginaryHuman(Posted 2014) [#11]
Yes, that error is thrown AFTER the code flows through to a section of program that it is not supposed to be flowing to. Since the semaphore fails, code gets executed when it isn't meant to and it produces the resulting array index error that you mentioned. That isn't the error, the error is that the WaitSemaphore does not work. Either that or maybe a LockMutex isn't working perhaps?


Brucey(Posted 2014) [#12]
And all those Globals make my head hurt.

Wouldn't this be much nicer (and possibly easier to debug) with at ThreadPool manager and no globals?


Derron(Posted 2014) [#13]
As it would end in a Singleton this "example code" just saved some minutes writing a wrapping-type and GetInstance function.

But to tell the truth: I thought the same when looking at the code.
Also "function(param)" is easier to read for me than "function param".

All in all this should not be the source of the problem :D


bye
Ron


Brucey(Posted 2014) [#14]
As it would end in a Singleton this "example code"

You'll notice I refrained from writing the word Singleton in my post ;-)
(although it had been there until I revised my answer, because you don't *need* a singleton)


Derron(Posted 2014) [#15]
You do not need if:
global myInstance:TClassName = new TClassName

or

Type TClassName
global varA:int
global varB:int
End Type
TClassName.varA = 2


Did I miss another option? Both variants are just for the sake of structuring the code. Do not spend some minutes converting the "Manager" into something like "TPoolMember" with a global list/array of all members and some "manager" functions. This won't count :D


bye
Ron


ImaginaryHuman(Posted 2014) [#16]
Wot you talking about? My code is perfect ;-)

Each to their own. I don't think the program has a bug in it. The WaitSemaphore() gets executed and DOES NOT WAIT.


ImaginaryHuman(Posted 2014) [#17]
It does compile fine, threaded or unthreaded, release or debug, on windows. Just not on osx.


ImaginaryHuman(Posted 2014) [#18]
I found that if I inject enough of a delay before the WaitSemaphore(), or even before the UnlockMutex that comes before the WaitSemaphore, then in debug mode it will work properly. It seems to me this is a bug, as I can't see how anything else unusual would be happening in-between the unlocking and the semaphore waiting. The following code corrects it:

	Method Pause()
		'Pause this thread, will resume when semaphore is signalled
		If Semaphore=Null Then Semaphore=CreateSemaphore(0)	'Create semaphore if not already done so
		LockMutex(ThreadPoolMutex)						'Lock it
		CurrentJob=-1									'Not working on a job
		Paused=True										'This thread is paused
		ThreadPoolAvailable[ThreadPoolAvailableThreads]=Index	'Mark this thread as paused and available for work
		ThreadPoolAvailableThreads:+1					'1 more available
		UnlockMutex(ThreadPoolMutex)					'Unlock it
?Debug
Delay 1'bugfix, WaitSemaphore won't work on debug otherwise
?
		WaitSemaphore(Semaphore)						'Wait for a signal to resume
		Paused=False
Print "zzzResuming "+index
	End Method


Unless I'm not understanding how mutexes and semaphores are supposed to work (mutex waits until the mutex is free them locks it, semaphore puts the thread on hold until a signal is posted), or am not accounting for `something` potentially happening in between the two instructions, this seems like a blitz bug to me. And it is only on OSX.


Derron(Posted 2014) [#19]
File a bug report and maybe you are the lucky one whose bug Mark is able to reproduce - if he then tries to fix it - time will show.


bye
Ron


ImaginaryHuman(Posted 2014) [#20]
Hmm... p'haps. For now I guess I have to live with this `bugfix` and hope I don't have to use the semaphores elsewhere.