Do I Have to Wrap Methods W/ Functions to Thread?
BlitzMax Forums/BlitzMax Beginners Area/Do I Have to Wrap Methods W/ Functions to Thread?
| ||
I've been doing some experiments, and it seems that to run a method in a thread a function needs to be wrapped around it. Has anyone else experienced this? Can a thread run a method directly? Feel free to request some code samples if needed. |
| ||
Hi It seems you are correct. Here's an example of a KD tree I coverted to be multi-threaded - it actually made the process slower! SuperStrict Type tvec3 Field x:Float,y:Float,z:Float EndType Function vec3:tvec3( px:Float, py:Float, pz:Float ) Local n:tvec3 = New tvec3 n.x = px n.y = py n.z = pz Return n EndFunction Type TKDNode Field pos:tvec3 Field Obj:Object Method Create:TKDNode( p:tvec3, o:Object ) pos = p obj = o Return Self EndMethod EndType Type TforeachObjectParam Field tree:TKDtree Field list:TList Field p:tvec3 Field width:Float Field height:Float Field depth:Float Field Callback:Int(o:Object) Field pDepth:Int = 0 EndType Type TKDTree Field parent:TKDTree Field l:TKDTree Field r:TKDTree Field pos:tvec3 Field obj:Object Global callmut:TMutex = CreateMutex() Method Create:TKDTree( pparent:TKDTree= Null) parent = pparent Return Self EndMethod Method insertVEC3:TKDTree( p:tvec3, o:Object, pDepth:Int = 0, pparent:TKDTree= Null ) Local axis:Int = pDepth Mod 3 Local lft:Int = False If Not pos Then parent = pparent pos = p Obj = o Return Self EndIf If axis = 0 Then If p.x < pos.x Then lft = True Else If axis = 1 Then If p.y < pos.y Then lft = True Else If axis = 2 Then If p.z < pos.z Then lft = True EndIf If lft Then If l Then Return l.insertVec3( p, o, pDepth + 1 ) l = New TKDTree.insertVec3( p, o, pDepth + 1,Self ) Return l Else If r Then Return r.insertVec3( p, o, pDepth + 1 ) r = New TKDTree.insertVec3( p, o, pDepth + 1,Self ) Return r EndIf EndMethod Method findObject:Object( p:tvec3, pDepth:Int=0 ) Local axis:Int = pDepth Mod 3 Local lft:Int = False If p.x = pos.x And p.y = pos.y And p.z = pos.z Then Return obj If axis = 0 Then If p.x < pos.x Then lft = True Else If axis = 1 Then If p.y < pos.y Then lft = True Else If axis = 2 Then If p.z < pos.z Then lft = True EndIf If lft Then If l Then Return l.findObject( p, pDepth+1 ) Else If r Then Return r.findObject( p, pDepth+1 ) EndIf Return Null EndMethod Method singleForEachTreeNodeDo:Int( list:TList, p:tvec3, width:Float, height:Float, Depth:Float, callback(o:Object), pDepth:Int = 0) Local axis:Int = pDepth Mod 3 Local lft:Int = False Local lft1:Int = False 'DebugStop If PointInRect(pos.x, pos.z, p.x, p.z, width, depth) Then ' LockMutex( callmut ) ' If Callback( obj )= False Return Null ' UnlockMutex( callmut ) list.addfirst( Self ) EndIf If axis = 0 Then If p.x < pos.x Then lft = True Else If axis = 1 Then If p.y < pos.y Then lft = True Else If axis = 2 Then If p.z < pos.z Then lft = True EndIf If axis = 0 Then If (p.x + width) < pos.x Then lft1 = True Else If axis = 1 Then If (p.y + height) < pos.y Then lft1 = True Else If axis = 2 Then If (p.z + depth) < pos.z Then lft1 = True EndIf If lft Then If l Then l.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 ) If Not lft1 And r Then r.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 ) Else If r Then r.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 ) If lft1 And l Then l.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 ) EndIf Return Null EndMethod Function threadForEachTreeNodeDo:Object( o:Object ) Local p:TforeachObjectParam = TforeachObjectParam( o ) p.tree.singleForEachTreeNodeDo( p.list, p.p, p.width, p.height, p.Depth, p.callback, p.pDepth ) EndFunction Method ForEachTreeNodeDo( p:tvec3, width:Float, height:Float, Depth:Float, callback(o:Object), pDepth:Int = 0) Const threadCount:Int = 2 Local thread:TThread[threadCount,threadCount] Local x:Int Local y:Int Local z:Int Local tree:TKDTree Local param:TforeachObjectParam[threadCount,threadCount] For x = 0 To threadCount-1 For z = 0 To threadCount-1 param[x,z] = New TforeachObjectParam param[x,z].tree = Self param[x,z].list = New TList param[x,z].p = New tvec3 param[x,z].p.x = p.x + (x* (width/threadCount)) param[x,z].p.y = p.y param[x,z].p.z = p.z + (z*(depth/threadCount)) param[x,z].width = width / threadCount param[x,z].height = height param[x,z].depth = depth / threadCount param[x,z].callback = Callback param[x,z].pDepth = 0 thread[x,z] = CreateThread( threadForEachTreeNodeDo, param[x,z] ) Next Next For x = 0 To threadCount-1 For z = 0 To threadCount-1 WaitThread( thread[x,z] ) For tree = EachIn param[x,z].list callback( tree.obj ) Next ClearList( param[x,z].list ) param[x,z].tree = Null param[x,z].list = Null param[x,z].p = Null param[x,z] = Null Next Next Rem Local thread:TThread[2,2] Local x:Int Local y:Int Local z:Int Local param:TforeachObjectParam For x = 0 To 1 For z = 0 To 1 param = New TforeachObjectParam param.tree = Self param.p = New tvec3 param.p.x = p.x + (x* (width/2)) param.p.y = p.y param.p.z = p.z + (z* (depth/2)) param.width = width / 2 param.height = height / 2 param.depth = depth / 2 param.callback = Callback param.pDepth = 0 thread[x,z] = CreateThread( threadForEachTreeNodeDo, param ) Next Next For x = 0 To 1 For z = 0 To 1 WaitThread( thread[x,z] ) Next Next EndRem EndMethod EndType Local t:tkdtree t = New tkdtree.Create( ) t.insertvec3( vec3( 30, 30, 30 ), "hello" ) t.insertvec3( vec3( 31, 30, 30 ), "goodbye" ) t.insertvec3( vec3( 30, 30, 60 ), "bye" ) t.insertvec3( vec3( -20, 30, 60 ), "neg" ) Print String(t.findObject( vec3(30, 30, 30))) Print String(t.findObject( vec3(31, 30, 30))) Print String(t.findObject( vec3(30, 30, 60))) Print String(t.findObject( vec3(-20, 30, 60))) '--------------------------------------------------------- ' Test Program '--------------------------------------------------------- Graphics 512,512,0 Local list:TList=New TList Type Point Field x:Int,y:Int EndType Local sectors:Int=32 Local points:Int=5000 Local scale#=sectors/512.0 Local graph:tkdtree= New tkdtree.Create( ) Global scenex:Int,sceney:Int Local p:point Local mode:Int,time:Int time = MilliSecs() For Local n:Int=1 To points p=New point p.x=Rand(0,511) p.y=Rand(0,511) graph.insertvec3( vec3(p.x-256, 0, p.y-256), p) list.addfirst(p) Next DebugLog MilliSecs() - time Local modename$="Scenegraph" Global size:Int=128,hsize:Int MoveMouse GraphicsWidth()/2,GraphicsHeight()/2 Global FPSLASTUPDATETIME:Int Local param:TforeachObjectParam = New TforeachObjectParam 'GCSuspend() While Not KeyHit(KEY_ESCAPE) If AppTerminate() End Cls() scenex=MouseX() sceney=MouseY() If KeyHit(KEY_MINUS) size=Max(10,size-10) If KeyHit(KEY_EQUALS) size:+10 'If KeyHit(KEY_EQUALS) DebugStop hsize=size/2 Select mode Case 0 'DebugStop graph.forEachTreeNodeDo( vec3(scenex-hsize-256, 0, sceney-hsize-256), size, 0, size, callback, 0) Case 1 For p=EachIn list If pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Plot p.x,p.y Next EndSelect If KeyHit(KEY_SPACE) mode=Not mode Select mode Case 0 modename$="Scenegraph" Case 1 modename$="Linear" EndSelect EndIf time=MilliSecs() SetColor 255,255,255 DrawText "UPS: "+FPS(),0,0 DrawText "Mode: "+modename,0,20 DrawText "space: toggle mode",0,40 DrawText "-/+: change rect size",0,60 Flip 0 FPSLASTUPDATETIME=FPSLASTUPDATETIME+(MilliSecs()-time) Wend End Function Callback:Int(o:Object) Local p:Point=Point(o) 'If Not pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Return 1 Plot p.x,p.y Return 1 EndFunction Function PointInRect:Int(x0:Int,y0:Int,x1:Int,y1:Int,w:Int,h:Int) If x0<x1 Return 0 If y0<y1 Return 0 If x0>x1+w Return 0 If y0>y1+h Return 0 Return 1 EndFunction Function FPS#(frequency:Int=200) Global FPSFRAMECOUNT:Int Global FPSLASTCOUNT:Float Local time:Int=MilliSecs() FPSFRAMECOUNT=FPSFRAMECOUNT+1 Local elapsed#=time-FPSLASTUPDATETIME If elapsed>=frequency FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0 FPSFRAMECOUNT=0 FPSLASTUPDATETIME=time EndIf Return FPSLASTCOUNT End Function |
| ||
As a thread requires a clear pointer, methods are out of question. Reason is that the GC can move objects and thus their methods, in memory. That makes them impossible to be used for pointers which would be invalid or add a significant overhead. Functions are class bound and always in the same position, thus they offer fixed pointers. Threads are a great option actually. But the main problem is that using them for small stuff and alike is a very bad idea because threads have a significant overhead for creation and destruction, so at best you have a permanent thread which you feed with new data to process for example. Above example sadly is not slow due to this but because one of the core rules is totally ignored, basically terminator killing the thread approach totally. Whatever you do, NEVER COUNT ON A SPECIFIC ORDER OF EXECUTION for threads. Or in relation to the code: the second loop after the thread spawn waits on specific threads to finish, so you risk that half the threads that are already finished are forced to remain pending until the lower indexed threads are done. This is a very bad approach. You would better spawn a given amount of threads as thread pool (size from #cores to 3*#cores at max). Those threads then would use a function which has a mutex lock present, to access a list and request the next item to process. That combined with the fix of the wait in there should basically skyrocket the performance. |
| ||
Agree with Dreamora. The code was meant to be a demo of how I got around passing param/functions. I realise the failing of my code, what I posted was about the 10th iteration of the code attempting to get it faster multi-threaded instead of single threaded. I did try the things you said in previous iterations. For this specific application though single thread was best. |
| ||
Sounds familiar :) Single thread is actually often the better solution unless you have a large enough task that you can offload into an own calculation task. Especially communication heavy tasks often are near impossible to do in a usefull way in their own thread. Thats why I build kind of a calculation grid with distinct processes long before BM reached 1.3.0, which uses networking for inter process communication, similar to how MPI / OpenMP work. Just that the distinct nodes have distinct duties. Was more of a proof of concept in the end as the project is on ice, but it was a very good educational experience. Now with threading beeing present, a mix of both would likely make it even more powerfull. |
| ||
Thanks everyone! Dreamora, it would be nice if you could fill us in more on this calculation grid, perhaps on another thread (pun intended). I've been thinking about such systems myself, and I've been developing my own. A thread would communicate with another by placing a "message" object into the other thread's message list. A mutex only has to be used on the list to add a message, and can then be unlocked, which I assume is much faster then having to lock all the data another thread holds that needs to be read. The thread needing the data would instead send a data request message, and in the other thread's reply, it would copy the object of which the data is needed, and attach it to a reply message. I suppose the problem would be that the garbage collector would have to deal with object copies. There would need to be a lot of copies, because the data they would hold would rapidly become obsolete. Could I manually delete the copies instead, saving the GC some work? I guess I'm getting off topic (and thus being a hypocrite). My question has already been answered, and I'm happier! Again, thanks for the help everyone. |
| ||
If you do it right, you won't have any dublicate objects actually. If you build your could about the single place of reference idea, you always will only have one thread operating on the object. A potential way of doing that is this: 1. main thread creates the operation object 2. main thread puts the operation object into a pending jobs list 3. the threads from a potential thread pool or whatever access the list in a secure way (using 2 special functions to add and get&remove that are mutex protected is likely the simplest way of doing it and can not lock down the whole app) 4. The worker thread gets the next work task from the pending jobs list 5. the worker does the operation 6. The worker puts the resulting data into a second protected list, the processed results list 7. the main thread periodically polls the results list. Its important that it does so through single result functions to not lock the list for a long time which would force There is no need to copy data as only a single object is having access to a specific object at a time so you do not get access collisions. In a more multi access environment, a monitor wrapperclass will likely be the most productive way to reduce data dublication and manage the accesses in a clean way |