Lock Free Multi-Threading

BlitzMax Forums/BlitzMax Programming/Lock Free Multi-Threading

col(Posted 2012) [#1]
Hi all,

I've been messing with some multithreading for a while now via lots of research and thousands of code examples. It's said that using Mutex locking puts other threads into a 'wait state' that want to use the locked data, which effectively pauses that thread until the lock becomes free. This gets compounded when other threads want to access the data and each one has to wait in turn. Anyway there are lots of articles to read on google if you want burn your grey matter :P

Lock free multithreading will still be able to lock some data but it doesn't block other threads from running ( putting them in a wait state ) while the data is locked. I've ported a small piece of code that creates a basic stack suitable for lock free multi-threading that uses an Atomic lock. This lock is done in hardware in the CPU(s) so there aren't any Mutexes in the code to halt any threads. The stack can then be safely accessed by multiple threads, pushing and popping items on and off the stack without breaking the stack linkage that would normally happen during multithreading if no lock or protection is put on the data. The example pushes and pops 10000 string based integers ( only because its easy to test for errors using the integer as an index ) on to the stack. As items are pushed onto the stack the code is also trying to pop them off, because of multithreading the stack entry will seem to be accessed at random which would normally cause a linkage break without data protection.

This is just the first basic building block of writing lock free multithread code and before I go any further with this multithreading endeavour - this sort of thing needs some kind of field testing.. so if you have the time I'd appreciate it if you could run the code and maybe scrutinize it for any errors or if there are any bugs in there which is causing the code to run correctly or incorrectly.

I've yet to test this for the ABA Scenario, so it may be not flawless at this stage.

ps. In the TStack type... If you unRem the standard Push/Pop (Lines 15 and 34) and uncomment Lines 36 and 56, then you will see what happens when accessing the stack without some sort of protection on the stack data. If it 'seems' to work ok with the standard methods then keep running it, it will break unexpectedly as per usual multithread variable access problems.

Remember to build the code with 'Threaded Build' ticked on.
It should be Windows,Mac and Linux compatible due to using 'native' blitz code.

EDITED for punctuality.


Last edited 2012


Noobody(Posted 2012) [#2]
Code looks pretty good, although it does suffer from the ABA problem.
In the Push method, New TNode might return an object address that has been previously used for another TNode (which has been released in the meantime). Depending on how unlucky your threads time their context switch, one thread might Push without any of the other threads seeing it.
This scenario is unlikely in BMax, since objects don't get released immediately when they fall out of scope, but it could still happen in a heavily multi-threaded environment with lots of accesses to the shared datastructure.

One of the other issues is that in a scenario with lots of threads and a lot of modifications to the stack, some of your threads will suffer from resource starvation, since your datastructure is only lock-free and not wait-free (there is no guaranteed per-thread progress).
However, wait-free datastructures are not easy to come by, since it is very hard to prove they're actually wait-free. Even more so in BMax, since there is no way of enforcing memory fences and not much is known about the compiler's tendency to do instruction reordering.

One thing I don't understand is why you're using bbAtomicCAS and not BMax's own CompareAndSwap. Is there an advantage to using the extern function?


col(Posted 2012) [#3]
Hiya,

Thanks for looking over the code.

Its been a bit of a brain ache to even get my head around the whole lock-free multithread paradigm but I think I've got it, and yes 'wait-free' is another ball game that I wouldn't even waste time attempting :D

I'll attempt to come up with a solution for the ABA problem before moving on. We'll see as this is already taking up more time than I thought it would, maybe incorporate some kind of counter, or time acquisition into the data. I don't know just yet, there are a couple of 'not-so-easy' options.


One thing I don't understand is why you're using bbAtomicCAS and not BMax's own CompareAndSwap. Is there an advantage to using the extern function?


CompareAndSwap uses integers by default and the compiler was complaining when using Objects. I was converting the address to an integer but it was ugly and unnecassary compared with just changing the parameters type in the bbAtomicCAS. There's no other reason.

Last edited 2012


Azathoth(Posted 2012) [#4]
In the Push method, New TNode might return an object address that has been previously used for another TNode (which has been released in the meantime).
Can you elaborate? I don't understand why this would be a problem as long as the old object no longer exists.


col(Posted 2012) [#5]
I think the problem that Noobody is referring to with the Push method would only occur in a non-GC environment.

This example uses a stack size of 100000 and 100 threads (are these really created?? seems so ) to push items onto the stack simultaneously, I've changed the values in the 'item' to account for the indexing after all the pushing. There isn't any data lost at all. This example is to prove that the stack linkage holds up. If any item linkage was lost due to ABA then the integer value wouldn't be in the stack and, after the 'popping', the value wouldn't be in 'poppedValues[]' array or maybe it would have been put into the array but not removed from the stack itself. This would show up in the last check in the 'Checking output' section of code.

Again, feel free to check it and correct me if I'm doing something wrong that's making it work correctly.



Last edited 2012


Noobody(Posted 2012) [#6]
@Azathoth: The New operator allocates space for an object on the heap and returns the corresponding memory address. If an object is released and another object is allocated on the heap, it will usually be placed at the same address as the previous object, since a block with matching size has already been allocated.

The problem is that CompareAndSwap compares object addresses to determine whether the object has changed in the meantime. If the object has indeed changed, but has the same address in memory (because the old one was released and the new one happened to be allocated at the same address), CompareAndSwap won't be able to see that and swaps anyway.


After revisiting col's code, however, it became clear that producing a scenario where one object would be freed, reallocated and then used in a CAS is impossible to construct, since all objects used in the CAS are at least referenced once at all times (and thus not collected).
I'm not sure whether the ABA problem completely disappears in garbage collected languages, but at least in this case, the GC prevents it from happening.


col(Posted 2012) [#7]
@Noobody

You're correct.


would only occur in a non-GC environment.


I'll explain myself a little better with what I meant with this statement...

I was referring solely to my example and the code logic used for the 'Push' method and the fact it's protected by the BMax garbage collector. This wouldn't be the case if say you did a direct port to native c/c++ using the exact same code logic - it would suffer serious ABA issues.

It's not a garbage collected language that prevents the ABA scenario - I'm sure it's so very possible with a different code logic.


Azathoth(Posted 2012) [#8]
The problem is that CompareAndSwap compares object addresses to determine whether the object has changed in the meantime. If the object has indeed changed, but has the same address in memory (because the old one was released and the new one happened to be allocated at the same address), CompareAndSwap won't be able to see that and swaps anyway.
But doesn't this imply the old object is still referenced?


Noobody(Posted 2012) [#9]
But doesn't this imply the old object is still referenced?

From a user perspective, you released your object - it's completely gone for all you know. Then, somewhere else in your code and in another thread, an object is allocated using New. Again, from the user perspective, this object is completely different from the one that was allocated before. But if it ends up with the same memory address, CompareAndSwap won't see the difference and proceed to swap.

To quote Wikipedia:
A common case of the ABA problem is encountered when implementing a lock-free data structure. If an item is removed from the list, deleted, and then a new item is allocated and added to the list, it is common for the allocated object to be at the same location as the deleted object due to optimization. A pointer to the new item is thus sometimes equal to a pointer to the old item which is an ABA problem.

But apparently I've spent too much time with unmanaged languages, since this scenarion can't occur in a garbage collected language such as BMax - as long as there's still a reference to an object around, it won't be reclaimed.


col(Posted 2012) [#10]
I'm currently working on an implementation of the Queue data structure, this has its own problems due to moving data at both ends of the queue, which has to happen atomically at the exact same time, which is impossible. But I know the 'lock-free queue' can be done with some clever logic.

These are 2 basic data structures that can be strategically used in a multi-thread renderer. We'll see...


col(Posted 2012) [#11]
The Queue implementation. Just needs some good quality testing code ;-)



Last edited 2012