Do I Have to Wrap Methods W/ Functions to Thread?

BlitzMax Forums/BlitzMax Beginners Area/Do I Have to Wrap Methods W/ Functions to Thread?

lukehedman(Posted 2009) [#1]
I've been doing some experiments, and it seems that to run a method in a thread a function needs to be wrapped around it. Has anyone else experienced this? Can a thread run a method directly?

Feel free to request some code samples if needed.


Chris(Posted 2009) [#2]
Hi

It seems you are correct. Here's an example of a KD tree I coverted to be multi-threaded - it actually made the process slower!




SuperStrict


Type tvec3
	Field x:Float,y:Float,z:Float
EndType
	
	
Function vec3:tvec3( px:Float, py:Float, pz:Float )
	Local n:tvec3 = New tvec3
	n.x = px
	n.y = py
	n.z = pz
	Return n
EndFunction

Type TKDNode
	Field pos:tvec3
	Field Obj:Object
	
	Method Create:TKDNode( p:tvec3, o:Object )
		pos = p
		obj = o
		Return Self
	EndMethod
EndType

Type TforeachObjectParam
	Field tree:TKDtree
	Field list:TList
	Field p:tvec3
	Field width:Float
	Field height:Float
	Field depth:Float
	Field Callback:Int(o:Object)
	Field pDepth:Int = 0
EndType

Type TKDTree
	Field parent:TKDTree
	Field l:TKDTree
	Field r:TKDTree
	Field pos:tvec3
	Field obj:Object
	Global callmut:TMutex = CreateMutex()
		
	Method Create:TKDTree( pparent:TKDTree= Null)
		parent = pparent
		Return Self
	EndMethod
	
	Method insertVEC3:TKDTree( p:tvec3, o:Object, pDepth:Int = 0, pparent:TKDTree= Null )
		Local axis:Int = pDepth Mod 3
		Local lft:Int = False
		
		If Not pos Then
			parent = pparent
			pos = p
			Obj = o
			Return Self
		EndIf
		
		If axis = 0 Then
			If p.x < pos.x Then	lft = True
		Else If axis = 1 Then
			If p.y < pos.y Then	lft = True
		Else If axis = 2 Then
			If p.z < pos.z Then	lft = True
		EndIf
		
		If lft Then
			If l Then Return l.insertVec3( p, o, pDepth + 1 )
			l = New TKDTree.insertVec3( p, o, pDepth + 1,Self )
			Return l
		Else
			If r Then Return r.insertVec3( p, o, pDepth + 1 )
			r = New TKDTree.insertVec3( p, o, pDepth + 1,Self )
			Return r
		EndIf
	EndMethod

	Method findObject:Object( p:tvec3, pDepth:Int=0 )
		Local axis:Int = pDepth Mod 3
		Local lft:Int = False

		If p.x = pos.x And p.y = pos.y And p.z = pos.z Then Return obj		
		
		If axis = 0 Then
			If p.x < pos.x Then	lft = True
		Else If axis = 1 Then
			If p.y < pos.y Then	lft = True
		Else If axis = 2 Then
			If p.z < pos.z Then	lft = True
		EndIf
		
		If lft Then
			If l Then Return l.findObject( p, pDepth+1 )
		Else
			If r Then Return r.findObject( p, pDepth+1 )
		EndIf
		Return Null
	EndMethod
	
	
	Method singleForEachTreeNodeDo:Int( list:TList, p:tvec3, width:Float, height:Float, Depth:Float, callback(o:Object), pDepth:Int = 0)
		Local axis:Int = pDepth Mod 3
		Local lft:Int = False
		Local lft1:Int = False
		'DebugStop
		If PointInRect(pos.x, pos.z, p.x, p.z, width, depth) Then
'			LockMutex( callmut )
'			If Callback( obj )= False Return Null
'			UnlockMutex( callmut )
			list.addfirst( Self )
		EndIf
		
		If axis = 0 Then
			If p.x < pos.x Then	lft = True
		Else If axis = 1 Then
			If p.y < pos.y Then	lft = True
		Else If axis = 2 Then
			If p.z < pos.z Then	lft = True
		EndIf

		If axis = 0 Then
			If (p.x + width) < pos.x Then	lft1 = True
		Else If axis = 1 Then
			If (p.y + height) < pos.y Then lft1 = True
		Else If axis = 2 Then
			If (p.z + depth) < pos.z Then	lft1 = True
		EndIf
		
		If lft Then
			If l Then l.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 )
			If Not lft1 And r Then r.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 )
		Else
			If r Then r.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 )
			If lft1 And l Then l.singleForEachTreeNodeDo( list, p, width, height, Depth, callback, pDepth+1 )
		EndIf

		Return Null
	EndMethod
	
	Function threadForEachTreeNodeDo:Object( o:Object )
		Local p:TforeachObjectParam = TforeachObjectParam( o )
		p.tree.singleForEachTreeNodeDo( p.list, p.p, p.width, p.height, p.Depth, p.callback, p.pDepth )
	EndFunction
	
	Method ForEachTreeNodeDo( p:tvec3, width:Float, height:Float, Depth:Float, callback(o:Object), pDepth:Int = 0)
		Const threadCount:Int = 2
		Local thread:TThread[threadCount,threadCount]
		Local x:Int
		Local y:Int
		Local z:Int
		Local tree:TKDTree
		Local param:TforeachObjectParam[threadCount,threadCount]
		For x = 0 To threadCount-1
			For z = 0 To threadCount-1
				param[x,z] = New TforeachObjectParam
				param[x,z].tree = Self
				param[x,z].list = New TList
				param[x,z].p = New tvec3
				param[x,z].p.x = p.x + (x* (width/threadCount))
				param[x,z].p.y = p.y
				param[x,z].p.z = p.z + (z*(depth/threadCount))
				param[x,z].width = width / threadCount
				param[x,z].height = height
				param[x,z].depth = depth / threadCount
				param[x,z].callback = Callback
				param[x,z].pDepth = 0
				thread[x,z] = CreateThread( threadForEachTreeNodeDo, param[x,z] )
			Next
		Next
		For x = 0 To threadCount-1
			For z = 0 To threadCount-1
				WaitThread( thread[x,z] )
				For tree = EachIn param[x,z].list
					callback( tree.obj )
				Next
				ClearList( param[x,z].list )
				param[x,z].tree = Null
				param[x,z].list = Null
				param[x,z].p = Null
				param[x,z] = Null
			Next
		Next
Rem

		Local thread:TThread[2,2]
		Local x:Int
		Local y:Int
		Local z:Int
		Local param:TforeachObjectParam
		For x = 0 To 1
			For z = 0 To 1
				param = New TforeachObjectParam
				param.tree = Self
				param.p = New tvec3
				param.p.x = p.x + (x* (width/2))
				param.p.y = p.y
				param.p.z = p.z + (z* (depth/2))
				param.width = width / 2
				param.height = height / 2
				param.depth = depth / 2
				param.callback = Callback
				param.pDepth = 0
				thread[x,z] = CreateThread( threadForEachTreeNodeDo, param )
			Next
		Next
		For x = 0 To 1
			For z = 0 To 1
				WaitThread( thread[x,z] )
			Next
		Next
EndRem		
	EndMethod

EndType

Local t:tkdtree
t = New tkdtree.Create( )
t.insertvec3( vec3( 30, 30, 30 ), "hello" )
t.insertvec3( vec3( 31, 30, 30 ), "goodbye" )
t.insertvec3( vec3( 30, 30, 60 ), "bye" )
t.insertvec3( vec3( -20, 30, 60 ), "neg" )


Print String(t.findObject( vec3(30, 30, 30)))
Print String(t.findObject( vec3(31, 30, 30)))
Print String(t.findObject( vec3(30, 30, 60)))
Print String(t.findObject( vec3(-20, 30, 60)))


'---------------------------------------------------------
' Test Program
'---------------------------------------------------------

Graphics 512,512,0

Local list:TList=New TList

Type Point
	Field x:Int,y:Int
EndType

Local sectors:Int=32
Local points:Int=5000

Local scale#=sectors/512.0

Local graph:tkdtree= New tkdtree.Create(  )

Global scenex:Int,sceney:Int

Local p:point

Local mode:Int,time:Int

time = MilliSecs()
For Local n:Int=1 To points
	p=New point
	p.x=Rand(0,511)
	p.y=Rand(0,511)
	graph.insertvec3( vec3(p.x-256, 0, p.y-256), p)
	list.addfirst(p)
Next
DebugLog MilliSecs() - time

Local modename$="Scenegraph"

Global size:Int=128,hsize:Int

MoveMouse GraphicsWidth()/2,GraphicsHeight()/2

Global FPSLASTUPDATETIME:Int

Local param:TforeachObjectParam = New TforeachObjectParam 



'GCSuspend()
While Not KeyHit(KEY_ESCAPE)

	If AppTerminate() End
	
	Cls()
	
	scenex=MouseX()
	sceney=MouseY()
	
	If KeyHit(KEY_MINUS) size=Max(10,size-10)
	If KeyHit(KEY_EQUALS) size:+10
	'If KeyHit(KEY_EQUALS) DebugStop


	hsize=size/2

	
	Select mode
	Case 0
		'DebugStop
		graph.forEachTreeNodeDo( vec3(scenex-hsize-256, 0, sceney-hsize-256), size, 0, size, callback, 0)
	Case 1
		For p=EachIn list
			If pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Plot p.x,p.y
		Next
	EndSelect
	
	If KeyHit(KEY_SPACE)
		mode=Not mode
		Select mode
		Case 0 modename$="Scenegraph"
		Case 1 modename$="Linear"
		EndSelect
	EndIf
	
	time=MilliSecs()
	
	SetColor 255,255,255
	DrawText "UPS: "+FPS(),0,0
	DrawText "Mode: "+modename,0,20
	DrawText "space: toggle mode",0,40
	DrawText "-/+: change rect size",0,60
	Flip 0
	
	FPSLASTUPDATETIME=FPSLASTUPDATETIME+(MilliSecs()-time)

Wend

End

Function Callback:Int(o:Object)
	Local p:Point=Point(o)
	'If Not pointinrect(p.x,p.y,scenex-hsize,sceney-hsize,size,size) Return 1
	Plot p.x,p.y
	Return 1
EndFunction

Function PointInRect:Int(x0:Int,y0:Int,x1:Int,y1:Int,w:Int,h:Int)
	If x0<x1 Return 0
	If y0<y1 Return 0
	If x0>x1+w Return 0
	If y0>y1+h Return 0
	Return 1
EndFunction

Function FPS#(frequency:Int=200)
	Global FPSFRAMECOUNT:Int
	Global FPSLASTCOUNT:Float
	Local time:Int=MilliSecs()
	FPSFRAMECOUNT=FPSFRAMECOUNT+1
	Local elapsed#=time-FPSLASTUPDATETIME
	If elapsed>=frequency
		FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0
		FPSFRAMECOUNT=0
		FPSLASTUPDATETIME=time
	EndIf
	Return FPSLASTCOUNT
End Function 




Dreamora(Posted 2009) [#3]
As a thread requires a clear pointer, methods are out of question.
Reason is that the GC can move objects and thus their methods, in memory.
That makes them impossible to be used for pointers which would be invalid or add a significant overhead.

Functions are class bound and always in the same position, thus they offer fixed pointers.


Threads are a great option actually.
But the main problem is that using them for small stuff and alike is a very bad idea because threads have a significant overhead for creation and destruction, so at best you have a permanent thread which you feed with new data to process for example.
Above example sadly is not slow due to this but because one of the core rules is totally ignored, basically terminator killing the thread approach totally.
Whatever you do, NEVER COUNT ON A SPECIFIC ORDER OF EXECUTION for threads.
Or in relation to the code: the second loop after the thread spawn waits on specific threads to finish, so you risk that half the threads that are already finished are forced to remain pending until the lower indexed threads are done. This is a very bad approach.
You would better spawn a given amount of threads as thread pool (size from #cores to 3*#cores at max). Those threads then would use a function which has a mutex lock present, to access a list and request the next item to process.
That combined with the fix of the wait in there should basically skyrocket the performance.


Chris(Posted 2009) [#4]
Agree with Dreamora.
The code was meant to be a demo of how I got around passing param/functions.

I realise the failing of my code, what I posted was about the 10th iteration of the code attempting to get it faster multi-threaded instead of single threaded.

I did try the things you said in previous iterations.
For this specific application though single thread was best.


Dreamora(Posted 2009) [#5]
Sounds familiar :)
Single thread is actually often the better solution unless you have a large enough task that you can offload into an own calculation task.
Especially communication heavy tasks often are near impossible to do in a usefull way in their own thread.

Thats why I build kind of a calculation grid with distinct processes long before BM reached 1.3.0, which uses networking for inter process communication, similar to how MPI / OpenMP work. Just that the distinct nodes have distinct duties.
Was more of a proof of concept in the end as the project is on ice, but it was a very good educational experience.

Now with threading beeing present, a mix of both would likely make it even more powerfull.


lukehedman(Posted 2009) [#6]
Thanks everyone!

Dreamora, it would be nice if you could fill us in more on this calculation grid, perhaps on another thread (pun intended).

I've been thinking about such systems myself, and I've been developing my own. A thread would communicate with another by placing a "message" object into the other thread's message list. A mutex only has to be used on the list to add a message, and can then be unlocked, which I assume is much faster then having to lock all the data another thread holds that needs to be read. The thread needing the data would instead send a data request message, and in the other thread's reply, it would copy the object of which the data is needed, and attach it to a reply message.

I suppose the problem would be that the garbage collector would have to deal with object copies. There would need to be a lot of copies, because the data they would hold would rapidly become obsolete. Could I manually delete the copies instead, saving the GC some work?

I guess I'm getting off topic (and thus being a hypocrite). My question has already been answered, and I'm happier! Again, thanks for the help everyone.


Dreamora(Posted 2009) [#7]
If you do it right, you won't have any dublicate objects actually.
If you build your could about the single place of reference idea, you always will only have one thread operating on the object.

A potential way of doing that is this:
1. main thread creates the operation object
2. main thread puts the operation object into a pending jobs list
3. the threads from a potential thread pool or whatever access the list in a secure way (using 2 special functions to add and get&remove that are mutex protected is likely the simplest way of doing it and can not lock down the whole app)
4. The worker thread gets the next work task from the pending jobs list
5. the worker does the operation
6. The worker puts the resulting data into a second protected list, the processed results list
7. the main thread periodically polls the results list. Its important that it does so through single result functions to not lock the list for a long time which would force


There is no need to copy data as only a single object is having access to a specific object at a time so you do not get access collisions.
In a more multi access environment, a monitor wrapperclass will likely be the most productive way to reduce data dublication and manage the accesses in a clean way