Would this multithread well?

BlitzMax Forums/BlitzMax Beginners Area/Would this multithread well?

JBR	(Posted 2016) [#1]

Hi, have a perlin noise maker which takes about 10ms to do a frame. I'm currently making 180 frames, which loop perfectly, and store them ready for plotting.

I only change the image every 4 vsyncs, so I have 4*16.667 = 65ms

What I'm asking is whether this kind of thing multithreads well in the background. The image is 512x512 and it uses maybe another 300kb of data.

Would I find it hindering my main code due to too much data?

Thanks, Jim.

col	(Posted 2016) [#2]

Sounds like a viable candidate for a second thread to generate the image. Assuming that you're rendering the image then maybe use 2 images with a double buffer type of approach - that would be to have one image being worked on and one in the main thread being rendered.

JBR	(Posted 2016) [#3]

Hi Dave, what commands should I be looking at ... simplest approach. Jim

grable

(Posted 2016) [#4]

Threading aint exactly simple, especially when you have to get data back from one without halting the main thread.

Heres one way of doing it, by using atomics:

SuperStrict

Type TWorker
	Field Result:TPixmap
	Field HasResult:Int
	
	Function func:Object( data:Object)
		Local worker:TWorker = TWorker(data)
		worker.Result = CreatePixmap( 512, 512, PF_RGBA8888)
		For Local i:Int = 0 Until worker.Result.Capacity Step 4
			worker.Result.Pixels[i+0] = Rand(256)
			worker.Result.Pixels[i+1] = Rand(256)
			worker.Result.Pixels[i+2] = Rand(256)
			worker.Result.Pixels[i+3] = 255
		Next
		While Not CompareAndSwap( worker.HasResult, False, True)
			Print "spinning"
		Wend
	EndFunction
EndType

Local worker:TWorker = New TWorker
CreateThread( worker.func, worker).Detach()

While Not CompareAndSwap( worker.HasResult, True, False)
	Print "waiting"
	Delay 1
Wend

Print "got data"

Normally you shouldnt be spinning on the wait condition like this (could have just use WaitThread() instead). But check this as often as you can in your mainloop.

And a better approach would also reuse the thread and do a swap between current and next pixmaps to save system resources, IE not allocate them each time.

Kryzon

(Posted 2016) [#5]

Read the introduction to the BRL.Threads module, it's got some information:
https://en.wikibooks.org/wiki/BlitzMax/Modules/System/Threads

But most of those functions have zero explanation if you're coming at this without ever hearing about them. You'll have to consult Google or StackOverflow ("what is atomic", "what is a semaphore" etc.).

I think another way to check on the state of a thread, without having to resort to idle spinning, is to use TryLockMutex and WaitSemaphore:

SuperStrict

Type TWorker
	Field isActive:Byte 'False = OFF, True = ON

	Field resultMutex:TMutex 'Used to control access to the pixmap data.
	Field semaphore:TSemaphore 'Used to pause this worker thread.
	Field semaphoreCount:Int
	
	Field result:TPixmap
	
	Method New()
	
		isActive	= True 'ON
		
		resultMutex	= CreateMutex()
		semaphore	= CreateSemaphore( 0 )
		semaphoreCount	= 0
		
	End Method
	
	
	Method isResultReady:Byte()
		
		If semaphoreCount = 0 Then Return TryLockMutex( resultMutex )
		Return False 'The 'semaphoreCount' helps avoid the main thread from locking 'resultMutex' consecutively.
		
	End Method
	
	
	Method resumeWork()
	
		UnlockMutex( resultMutex )
		PostSemaphore( semaphore )
		semaphoreCount :+ 1
	
	End Method

	
	Function func:Object( data:Object )
		
		Local worker:TWorker = TWorker( data )
				
		While( worker.isActive )
		
			LockMutex( worker.resultMutex ) 'This will either work now or this thread will block while it waits.
				
				worker.Result = CreatePixmap( 512, 512, PF_RGBA8888)
				For Local i:Int = 0 Until worker.Result.Capacity Step 4
					worker.Result.Pixels[i+0] = Rand(256)
					worker.Result.Pixels[i+1] = Rand(256)
					worker.Result.Pixels[i+2] = Rand(256)
					worker.Result.Pixels[i+3] = 255
				Next
					
			UnlockMutex( worker.resultMutex ) 'Result is ready.
			WaitSemaphore( worker.semaphore ) 'Wait for the main thread to process it.
			worker.semaphoreCount :- 1
			
		Wend
		
	EndFunction
	
EndType

Local worker:TWorker = New TWorker
CreateThread( worker.func, worker ).Detach()

While Not worker.isResultReady() 'If the child thread has finished working on the pixmap. This locks the mutex that refers to the pixmap access.
	Print "processing other stuff in the program"
	Delay 16
Wend

Print "pixmap ready to be used by the main thread"

'...

'To let the child thread continue...

worker.resumeWork()

EDIT: Added an extra check so the main loop doesn't lock the result mutex consecutively.

JBR	(Posted 2016) [#6]

Thanks guys, I'm liking the simplicity of grable's code.

I have a question.

Why the need for CompareAndSwap

could I not just check worker.HasResult directly.

Jim

grable

(Posted 2016) [#7]

could I not just check worker.HasResult directly.

Because there is no way to guarantee that the read and write happens in the order you expect.
In its simplest form, the write could be half done when trying to read.
But with all the different cache levels in a cpu, there is no telling in what state memory is in when using threads.

This is a problem that has increased with multiple cores, where two or more threads can read or write to the same address at the same time.

CompareAndSwap, AtomicAdd and AtomicSwap are atomic operations, meaning they are synced at the cpu level.
It is guaranteed that any reads and writes to the same memory address happens in a defined way.

I suggest reading up on Threading and Atomics as there are many many pitfalls that stand in the way of getting things working correctly ;)

JBR	(Posted 2016) [#8]

ok but the actual Result:TPixmap is not guaranteed to be of the correct values when I modify it after the "got data" is printed.

Also, does the .Detach() just removes the thread when done?

Sorry for my beginner questions. Jim.

Kryzon

(Posted 2016) [#9]

I should add that when I was studying threading in BlitzMax I learned a lot from the user tutorials:

- http://www.blitzbasic.com/Community/posts.php?topic=91458
- http://www.blitzbasic.com/Community/posts.php?topic=80677

They answer questions like what does the Detach method do.

col	(Posted 2016) [#10]

Sorry to come in late here guys...

You can read and write to the same variable without locking when you completely understand the nuances of the Read-Modify-Write paradigm. In the case of simply exiting a worker thread that's looping then you can decide if it matters or not if the loop goes around once more because of the read being made while the variable was being updated. The problem comes with modifying the contents of address ( ie a variable ) isn't atomic but if you are relying on that it is. This is where understanding the Read-Modify-Write process is invaluable and the biggest advise to give would to be truly understand it. You should also read up on and understand 'out of order execution' too. Google will be your best friend here.

In BlitzMax there are a couple of multithread primitives that can be used. Multithreading is anything but simple and has to be planned out well to be robust. You will always need to think about how many threads can access what data and at what times and if any variables need to be protected from being accessed from more that 1 thread. It's always best to plan ahead with threading and ALWAYS cater for something that is extremely unlikely to happen - believe me if there's an unlikely chance that 2 or more threads could access the same data at the same time, no matter how slim that chance is then don't take the chance.

Multithreading solutions can still be done in is many different ways as it is when you're single threading.

Enough of the lecture :D on with something practical...
From what you describe I'd write something that's similar to this:

1. Create a pool of TImages. As long as you don't DrawImage in the second thread and only DrawImage in the main thread then using TImages is perfectly safe in other threads. TImage use a TPixmap for the underlying pixel data.
2. Create a multithread safe queue that can be accessed by both threads. You'll want the thread thats creating the images to wait if the queue is full but the thread taking images is allowed to carry on if the queue is empty.

The flow of code would be this...
You have pool of existing empty TImages, it can be any number of images, but due to how I've written the code then you need a number of images that are of a power of 2. You allow only one image to be taken from the pool and you use an index to know which image is next. You use a 'round robin' index so that you can keep using the same images/pixmaps.
You have a multithread queue that holds the same number of items as there as TImages in the pool so that you take advantage of some multithread syncing constructs - in this case a semaphore.

In the main thread: query the queue to see if a TImage is available and ready. If no image is available then no image is returned ( Null for eg ), and if an image is available the DrawImage that TImage. The main thread is now also free to carry on doing something else whether it gets a new image or not from the queue. You need to cater for if an image is ready and also if it isn't ready. You never know if the OS will step for some arcane reason and prevent your code running as fast as you're expecting - this is very unlikely but these are the kinds of things that can trip up your code and break everything.

In the second thread you have it take an image from the image pool, do it's image manipulation and then put that image into the queue ready for the main thread to take when it wants it. When/if the queue is full then you have that thread sleep so that's its not hogging the cpu core for nothing. This also allow the OS to run much more smoothly and if the OS is running smooth then so it your code too.

Using that approach then brings the issue to a making sure that two threads can't access the same data at the same time in the queue.

Here's a working example of using 4 existing images that the queue rotates between. You could move the components around, for eg putting the image pool inside the queue but this approach is simply keep things as they 'should be' as in a pool of images is a pool, and the queue is just a queue. The queue is modified from a regular queue in that the thread that puts items into the queue will wait when the queue is full, however the thread taking items from the queue won't wait if an item is not available.

SuperStrict

Type TImagePool
	Field _images:TImage[]
	Field _index:Int
	Field _count:Int
	
	Method Create:TImagePool(Count:Int,Width:Int,Height:Int)
		_images = New TImage[Count]
		
		' UNNECESSARY EXAMPLE CODE
		' The main thread creates this instance so use that opportunity to create some fake images
		' In you real app these would be leave as blank valid TImages ready for the 2nd thread to
		' manipulate the pixel data
		For Local i:Int = 0 Until Count
			Cls
			DrawLine 128,128,128+Cos(i*90)*96,128+Sin(i*90)*96
			Flip

			_images[i] = CreateImage(256,256)
			GrabImage(_images[i],0,0)
		Next
		' END OF EXAMPLE CODE 
		
		
		_count = Count - 1
		Return Self
	EndMethod
	
	Method GetImage:TImage()
		Local image:TImage = _images[_index]
		
		_index = _index + 1 & _count

		Return image
	EndMethod
EndType

Type TImageQueue
	Field _head:Int = 0
	Field _tail:Int = 0
	
	Field _slots:TImage[]
	Field _count:Int
	
	Field _enqsem:TSemaphore
	Field _mutex:TMutex
	
	Method Create:TImageQueue(Count:Int)
		_slots = New TImage[Count]

		_count = count - 1
		_enqsem = CreateSemaphore(_count)
		_mutex = CreateMutex()

		Return Self
	EndMethod

	' only the second thread will ever call this method
	Method Enqueue(Image:TImage)
		' tell the semaphore to decrement its value
		' if the semaphore has a count of 0 then wait here
		WaitSemaphore _enqsem
		
		LockMutex(_mutex) ' use a mutex here because other threads are accessing the _tail variable via the Dequeue method
		' if a thread is here then there is guaranteed an empty slot in the queue
		_slots[_tail] = Image
		_tail = _tail + 1 & _count
		
		UnlockMutex(_mutex)
	EndMethod
	
	
	' Only the main thread will ever call this method
	' if an image is available then return it or return null
	Method Dequeue:TImage()
		' are there any images to get?

		' use a mutex to prevent accessing _tail from 2 threads at once
		' if the queue is empty then return null without waiting
		LockMutex(_mutex)
		If _head = _tail
			UnlockMutex(_mutex)
			Return Null
		EndIf
		UnlockMutex(_mutex)
	
		' get the image and move the _head pointer
		Local image:TImage = _slots[_head]
		_head = _head + 1 & _count
			
		PostSemaphore _enqsem ' tell the semaphore to increment - meaning its ok to now enqueue an image
		Return image
	EndMethod
EndType

' Create the graphcis context here so that some fake images can be created during the TImagePool.Create method. Normally
' you wouldn't need to this here - it's here just for the demo purpose only
Graphics 800,600

' The image queue is accessed from all threads
Global ImageQueue:TImageQueue = New TImageQueue.Create(4)
Global ImagePool:TImagePool = New TImagePool.Create(4,256,256)


' simple switch to end the thread nicely
Global KeepThreadRunning:Int = True





' Thread to create image data and put them into a queue
Function CreateImageThread:Object(data:Object)
	
	' Its perfectly safe to read the variable here without locking - you need to understand how the cpu reads memory
	' Read up on ReadModifyWrite as to why I don't bother with an atomic operation here.
	While KeepThreadRunning

		' take image from the pool and work on it
		Local image:TImage = ImagePool.GetImage()
		' This example does nothing here as the images are 'premade' already
		
		' Do work on the image - create perlin noise etc	
		' CreatePerlinNoise(image)

		' Add this image to the queue
		' This thread will wait inside the Enqueue method if the queue is full
		ImageQueue.Enqueue(image)
	Wend
	Print "Finished"
EndFunction



' create a thread that manipulates image data and stores them in the queue ready for the main thread
Local Worker:TThread = CreateThread(CreateImageThread,Null)



' main app
Global framedelay:Int
Local NewImage:TImage
While Not KeyDown(KEY_ESCAPE)
	Cls

	' For this demo create a counter so we now every 4th frame
	If framedelay = 0	' each time the framedelay loops back to 0 then we get an image from the queue
		Local image:TImage = ImageQueue.Dequeue()
		If image NewImage = image
	EndIf
	
	' loop 0 to 3 then back to 0 - ie 4 frames at 60fps - 0 comes around @ 15fps
	framedelay = (framedelay + 1) & 3

	' you need to take into consideration that the image isn't ready yet
	' so display the old image
	If NewImage
		DrawImage NewImage,20,20
	EndIf

	Flip
Wend

' You need to remove an item from the queue because the 2nd thread *could* be sleeping in the semaphore
ImageQueue.Dequeue()

' Read up on ReadModifyWrite as to why I don't bother with an atomic operation here.
KeepThreadRunning = False
WaitThread Worker

So there are a couple of multithread 'constructs' in there.
A mutex is a special object that can restrict other threads from entering the same code path at the same time. This makes a section of code 'mutually exclusive', ie there can be only one, which refers to the number of threads accessing that code path. People often talk of using a mutex to protect a variable,this is confusing as its not actually what a mutex does. While a mutex is locked then no other thread can lock it and any attempt to try to lock it will make that thread wait until the mutex is unlocked. So the real use of a mutex is to create a critical section of code that can only be accessed by one thread. You then take advantage that only one thread can be inside a critical section of code to update your variables that you want to guarantee are updated correctly. There are things you can do so that the thread doesn't wait if the mutex is locked already for eg using the TryLockMutex function. TryLockMutex will not make the thread wait if the mutex is already locked but that thread isn't allowed into the critical section of code either.

Also there is a semaphore used in there too. A semaphore is like a counter. You create a semaphore with an integer value that is its counter value. Each time you call WaitSemaphore then its internal counter will decrement. If the value is NOT zero then the WaitSemaphore function will let the thread continue execution. If the semaphore internal value does hit zero then the thread is put to sleep forever waiting until the value is no longer zero. When you PostSemaphore then the internal counter is incremented and if the value was zero then only one single thread that was waiting for the increment will be woken and can carry on its execution. If the case of multiple threads waiting on a single semaphore then the OS will decide which thread to wake up. The operation of updating its own internal counter is atomic which allows you call WaitSemaphore and PostSemaphore from any threads - careful planning needs to be done to take advantage of this construct and when used correctly is very powerful.

Silver_Knee

(Posted 2016) [#11]

Hi,
I didn't understand: do you generate them once and use them later or do you generate one or 180 every 65 seconds?

If you want to generate them every 65 seconds you could use the backbuffer / frontbuffer idea with that. Having two pixmaps that don't interfere with eachother. You need 2 semaphores so if the worker is fast it will not produce endless frames.

Type TWorker
  Field pixmap:TPixmap[2] 'back and front buffer
  Field usePixmap:Int 'defines if pixmap[0] or pixmap[1] is the backbuffer 
  Field isDone:Int 'style points
  Field producerSemaphore:TSemaphore=CreateSemaphore()
  Field consumerSemaphore:TSemaphore=CreateSemaphore()

  Method New()
    pixmap[0]=CreatePixmap( 512, 512, PF_RGBA8888)
    pixmap[1]=CreatePixmap( 512, 512, PF_RGBA8888)
  End Method

  Method GetNextPixmap:TPixmap()
    producerSemaphore.Wait() 'this will wait until the work is done
    
    Repeat
      Local returnPixmap=usePixmap
    Until CompareAndSwap(usePixmap,returnPixmap,Not returnPixmap)
    
    consumerSemaphore.Post() 'release the worker
    
    Return pixmap[returnPixmap]
  End Method

  Function Work:Object(data:Object)
    Local woker:TWorker=TWorker(data)
    
    While Not worker.isDone
      For Local i:Int = 0 Until worker.Result.Capacity Step 4
        worker.pixmap[usePixmap].Pixels[i+0] = Rand(256)
        worker.pixmap[usePixmap].Pixels[i+1] = Rand(256)
        worker.pixmap[usePixmap].Pixels[i+2] = Rand(256)
        worker.pixmap[usePixmap].Pixels[i+3] = 255
      Next
      
      producerSemaphore.Post() 'release GetNextPixmap
      consumerSemaphore.Wait() 'now we wait for GetNextPixmap
    Wend
  End Function
End Type

Local worker:TWorker=New TWorker
CreateThread( func, worker )

'Main loop
Local currentPixmap:TPixmap
Local timeHasComeToGetANewPixmap:Byte=True
Repeat
  If timeHasComeToGetANewPixmap
    currentPixmap=worker.GetNextPixmap();
  EndIf
  '...
Forever

If you produce them once and use them later you can wait for all 180 frames to finish with one semaphore.

Type TWorker
  Global semaphore:TSemaphore
  Field pixmap:TPixmap

  Method New()
    pixmap=CreatePixmap( 512, 512, PF_RGBA8888)
  End Method

  Method GetPixmap:TPixmap()
    Return pixmap
  End Method

  Function Work:Object(data:Object)
    Local woker:TWorker=TWorker(data)
    
    For Local i:Int = 0 Until worker.Result.Capacity Step 4  
      worker.pixmap[usePixmap].Pixels[i+0] = Rand(256)
      worker.pixmap[usePixmap].Pixels[i+1] = Rand(256)
      worker.pixmap[usePixmap].Pixels[i+2] = Rand(256)
      worker.pixmap[usePixmap].Pixels[i+3] = 255
    Next
      
    semaphore.Post()
  End Function
End Type

Local workerCount=180
TWorker.semaphore=CreateSemaphore(180)
Local worker:TWorker[]=New TWorker[workerCount]

For Local i:Int=0 Until workerCount
  worker[i]=New TWorker
  CreateThread( TWorker.Work, worker[i] )
Next

TWorker.semaphore.Wait()

In general: you have a Producer / Consumer - Problem. Semaphores are just right for that: Post is used to signal one unit of work is done and wait will wait for all units of work to be done. How many units of work is defined by CreateSemaphore. You will need one semaphore for each thread that should be waiting.

In the case you have a "this bit of code should only be executed by one thread" problem, mutexes are the better solution.

Also using semaphores and mutexes are always better solutions than running in a loop checking for variables to change. They are - in theory - low-level implementations that will tell the processor for real that this thread will not be useful until that mutex will be unlocked or this semaphore reaches 0. So the scheduler can ignore these threads until it is done.
If the thing you are waiting for is a little more complicated and you really want to do some checking code, if all conditions are met and your thread can continue, the CondVars are your tool. They let you Wait until someone that knows that CondVar wakes your thread up - like "hey, something happend". You then can then check all your conditions and go to sleep again if they aren't met. Every waiting thread needs a mutex to use with the condvar. So you have a low-level waiting again and only check your parameters when it matters and not randomly every 10 milliseconds.

Greez

EDIT: Whoa i wrote that for an hour ^^ well many something like that can't be retold enough..

Kryzon

(Posted 2016) [#12]

When you only use semaphores you're establishing that the main thread (the consumer) will wait for the producer threads to finish. The operation will not be asynchronous.
The main thread could do other processing in the mean time if you're talking about a real-time game or if the application has a GUI and you want it to be responsive and not "block" while the work is being done -- if you're using MaxGUI, if you block the main thread with a semaphore the application looks frozen.

With both a semaphore and a mutex you can use TryLockMutex, as it never locks the caller. When you use this for the main thread (consumer) you can let it process other things if the mutex is already locked by some other thread, as illustrated in post #5.

col	(Posted 2016) [#13]

@Kryzon,

I was just experimenting with your code,
If you put a delay of say 2000 in the 2nd thread say at line 47 then you still get an immediate message that the pixmap is ready, however the pixmap can't be ready yet until the delay has passed. Am I missing something?

JBR	(Posted 2016) [#14]

I'm afraid I'm all at sea atm.

Would it help if I published my animated perlin routine?

Jim

col	(Posted 2016) [#15]

It certainly wouldn't hurt :-)

Kryzon

(Posted 2016) [#16]

Hey col, good catch.
I think the problem comes from the fact that that code assumes that if he main thread (the consumer) can successfully lock the pixmap mutex of the producer then a pixmap will be ready, but that's not necessarily true.

So I imagine a better way to do it would be to have a "state" value that can be verified:

SuperStrict


Type TWorker

	Const STATE_IDLE:Byte		= 0		'Worker doesn't have a pixmap ready and is not doing anything.	
	Const STATE_WORKING:Byte	= 1		'Worker is making a pixmap.
	Const STATE_FINISHED:Byte	= 2		'Worker has finished the pixmap and is waiting for the main thread to use it.

	'Internal.

	Field _state:Byte

	Field _resultMutex:TMutex	'Used to control access to the pixmap data.
	Field _isMainLocked:Byte	'True if the mutex is locked by the main thread.
	
	Field _stateMutex:TMutex	'Used to control access to the state variable.
	
	Field _semaphore:TSemaphore	'Used to pause this worker thread.
	
	Field _result:TPixmap
	
	
	Method New()	
	
		_state = STATE_IDLE
	
		_resultMutex	= CreateMutex()
		_stateMutex		= CreateMutex()
		_semaphore		= CreateSemaphore( 0 )
			
	End Method
	
	
	'Only to be called by the main thread.
	
	Method state:Byte()
	
		If TryLockMutex( _stateMutex ) Then
			Local tempState:Byte = _state 'Using a second varible just so we can unlock the state mutex.
			UnlockMutex( _stateMutex )
			Return tempState
		Else
			Return STATE_WORKING
		EndIf
	
	End Method
	
	
	Method setState( newState:Byte )
	
		LockMutex( _stateMutex )
			_state = newState		
		UnlockMutex( _stateMutex )	
	
	End Method
	
	
	'Doesn't block the caller.
	
	Method tryGetResult:TPixmap()
		
		If _state = STATE_FINISHED Then
			If TryLockMutex( _resultMutex ) Then 
				_isMainLocked = True
				Return _result
			EndIf
		EndIf
		
		Return Null
		
	End Method
	
	
	'May block the caller.	
	
	Method getResult:TPixmap()
	
		If _state = STATE_FINISHED Then
			LockMutex( _resultMutex )
			_isMainLocked = True
			Return _result	
		EndIf
		
		Return Null
		
	End Method
		
	
	Method startWork()
	
		Select( _state )
			Case STATE_IDLE
				If _isMainLocked Then UnlockMutex( _resultMutex )
				CreateThread( TWorker._func, Self )
				
			Case STATE_WORKING
				'Do nothing.
			
			Case STATE_FINISHED
				If _isMainLocked Then UnlockMutex( _resultMutex )
				PostSemaphore( _semaphore )
				CreateThread( TWorker._func, Self )
		End Select
	
	End Method

	
	Function _func:Object( data:Object )
		
		Local worker:TWorker = TWorker( data )
		If worker Then worker._innerFunc() 'Go to a method to not have to type 'worker.' every time to access fields.
		
	End Function
	
	
	Method _innerFunc()
		
		LockMutex( _resultMutex ) 'This will either work now or this worker thread will block while it waits.
			setState( STATE_WORKING )
			
			Rem
			_result = CreatePixmap( 512, 512, PF_RGBA8888)
			For Local i:Int = 0 Until worker.Result.Capacity Step 4
				_result.Pixels[i+0] = Rand(256)
				_result.Pixels[i+1] = Rand(256)
				_result.Pixels[i+2] = Rand(256)
				_result.Pixels[i+3] = 255
			Next
			EndRem
			Delay 500 'Do some random heavy processing.
	
			_result = CreatePixmap( 512, 512, PF_RGBA8888 )
			_result.ClearPixels( 0 )		
			
			setState( STATE_FINISHED )
		UnlockMutex( _resultMutex ) 'Resulting pixmap is ready for access by other threads.
		
		WaitSemaphore( _semaphore ) 'Wait for the main thread to process it.
				
	End Method
	
End Type


Local worker:TWorker = New TWorker
worker.startWork()

Local done:Byte = False
While Not done
	If worker.state() = TWorker.STATE_FINISHED Then
		done = True
	Else
		Print "processing other stuff in the program"
		Delay 16
		
		'Check for the user choosing to quit the application here, to keep it responsive.
		'...
		
	EndIf
Wend

Print "pixmap ready to be used by the main thread"

'...

'If you intend to keep the child thread constantly working on new pixmaps, you'll need to use a loop in the 'func' method
'so it's more efficient than creating a new thread each time.

Derron

(Posted 2016) [#17]

@Kryzon
Your code in #5 does not fail because the code is able to "(Try)LockMutex()" on a locked mutex .. if that would not work, I assume the threading code of BlitzMax would be borked.

I more likely tend (without testing) to say that this line is the culprit:

If semaphoreCount = 0 Then Return TryLockMutex( resultMutex )

You already recognized, that you need a custom "state" variable. Why?

This is your logic run:

a) create a worker and assign a function
b) loop until worker is finished

Now the important part:
a) creates a worker with an _unlocked_ mutex and detaches the thread
b) is run _parallel_ to the thread (no sequential chain anymore)!

CPUTick 000: a)
CPUTick 500: b) isReady()? worker.func() was not called yet
CPUTick 502: a1) worker.func() with LockMutex()

So instead of an "finished" state, you could even use a "WhileLoopCount" or whatever (whileRun > 0 ...)

I use that TMutex just to see whether I am able to modify the "mutexed" variable. I cannot see with that Mutex if something at least modified a variable _once_. It is just an "currently in use" property.

bye
Ron

JBR	(Posted 2016) [#18]

Here is the animated perlin noise code. I should mention I adapted sswift's code from the archive, 2003 I think, and enhanced.

I'd like to see it run in the background and change the image every 4 frames.

Jim.

Strict
SetGraphicsDriver GLMax2DDriver()

Const C_Screen_Width = 1920
Const C_Screen_Depth = 1080

Graphics( C_Screen_Width, C_Screen_Depth, 32,60, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )


Const C_Height_Map_Size = 512

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ (C_Height_Map_Size/2+1) , (C_Height_Map_Size/2+1) ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]


'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Global angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Global angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Global angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Global angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Global angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Global angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next



Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next


Global Image:TImage = CreateImage( C_Height_Map_Size, C_Height_Map_Size )
SetColor 0,0,255
SetScale 2,2

'MAIN
'----

Repeat

	For Local Y = 0 To C_Height_Map_Size
		For Local X = 0 To C_Height_Map_Size
			HeightMap#[ X + Y*513 ] = 0
		Next
	Next
		
		
	Local time=MilliSecs()
		
	Generate_Heightmap()
	
	
	Local pixmap:TPixmap = LockImage(Image:TImage)			'extract it's pixmap
	Local pixels:Int Ptr = Int Ptr(pixmap.pixels)		'access pixmap memory area.

	For Local y% = 0 To C_Height_Map_Size-1
		For Local x% = 0 To C_Height_Map_Size-1
		
			Local col% = Int(HeightMap#[x% + y%*513]) 
			
			If col% < 0 Then col%=0
			If col%>255 Then col%=255
			
			pixels[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%
		Next
	Next
	

	UnlockImage(Image:TImage)
	
	time = MilliSecs()-time

	DrawImage Image,0,0 								'draw the modified image.
		
	
	

	DrawText String(time),100,100

	Flip(True)

Until KeyDown(KEY_ESCAPE)

WaitKey()
End


'--------------------------------------------------------------------------------------------------------------------
'--------------------------------------------------------------------------------------------------------------------
Function Generate_Heightmap()
	
		'firstly increment the angles_inc for each NoiseMap Size
		'-------------------------------------------------------
		angles_inc_256# :+ angles_step_256#
		angles_inc_128# :+ angles_step_128#
		angles_inc_64#  :+ angles_step_64#
		angles_inc_32#  :+ angles_step_32#
		angles_inc_16#  :+ angles_step_16#
		angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

	Local AngleMapSize = 256																			'C_Height_Map_Size/2
	
	Local Angle_Map:Float[,]
	Local Angle_Inc:Float
	
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

		Select AngleMapSize
			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
		End Select
		
		For Local y%=0 To AngleMapSize
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map#[ x%, y% ] + Angle_Inc#
				Local cos_angle# = cosine#[ angle% & 2047 ]
			
				NoiseMap#[ x%, y% ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = NoiseMap#[Noise_X,   Noise_Y]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = NoiseMap#[Noise_X+1, Noise_Y]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = NoiseMap#[Noise_X,   Noise_Y+1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = NoiseMap#[Noise_X+1, Noise_Y+1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - cosine#[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		AngleMapSize = AngleMapSize/2
		
	Until AngleMapSize <= 16	' was 1

End Function

dw817

(Posted 2016) [#19]

This is beautiful, JBR ! Here is how I would do it. And yes, I can SORT OF make a perline image, I did so on Commodore Amiga. Easier to just nab this image tho:

' Fuzzy Mist Floater
Strict

Global img:TImage=LoadImage(LoadBank("http::www.msjphotography.com/wp-content/gallery/tlc_ptc2_1/Perlin-Noise-seamless_PC02.jpg"))
' ^ load perl from internet
Local x1,y1,x2,y2

SeedRnd MilliSecs() ' always get random results
SetGraphicsDriver GLMax2DDriver(),0 ' front buffer graphics
Graphics 1024,768
SetBlend lightblend ' let colors pile up on each other

x1=fnr(512) ; y1=fnr(384) ' pick random points so it cannot
x2=fnr(512) ; y2=fnr(384) ' be seen they are the same image

SetScale 2,2 ' double size
SetAlpha .5 ' takes 2 images to make 100% opacity
SetColor 0,0,255 ' blue

Repeat ' {* MAIN *}
Cls
TileImage img,x1,y1 ' tile image, doesn't matter where it starts
TileImage img,x2,y2
x1:+2 ' first goes right
y2:-1 ' second goes up
Delay 50 ' bit of delay
Flip 0 ' show work

Until KeyDown(27) ' exit on ESCAPE

' >> SIMPLE GET RANDOM NUMBER (0..a-1)
Function fnr(a)
  Return Rand(a)-1
EndFunction

Brucey

(Posted 2016) [#20]

Here is the animated perlin noise code

You can never have too many Globals :-p

grable

(Posted 2016) [#21]

. double post

grable

(Posted 2016) [#22]

Here you go :)

~~The generation time is relatively stable. The lowest=49, the highest=72 and avg=60.~~ <--(was running under power saving)
~~And it only has to wait 1 more frame whenever it exceeds the time needed.~~
The generation time is not very stable, 12/13.
Im running on a 6700K though, your mileage may vary.

EDIT: Fixed bug that made it wait the full 4 frames when time exceeded.
EDIT2: Fixed another bug in Terminate() not terminating.

Strict

Const C_Screen_Width = 1024
Const C_Screen_Depth = 768

Const C_Height_Map_Size = 512

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ (C_Height_Map_Size/2+1) , (C_Height_Map_Size/2+1) ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]


'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Global angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Global angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Global angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Global angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Global angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Global angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next

Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next


'
' BEGIN
'
Graphics( C_Screen_Width, C_Screen_Depth, 0,60, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )
SetBlend ALPHABLEND

Local worker:TWorker = New TWorker.Create()
Local img:TImage

' initiate first request
worker.RequestResult()
Local count:Int
Local waitcount:Int, lastwaitcount:Int
Local lowest:Int = $FFFF, highest:Int
Repeat
	SetColor 0,0,255
	SetScale 1.5,1.5
	If img Then DrawImage img, 0,0

	SetColor 255,0,0
	SetScale 2,2
	DrawText worker.Time + " (" + lastwaitcount + ")", 32,32
	
	Flip
	Cls
	
	' get next pixmap
	count :+ 1
	If count >= 4 Then
		Local nextpix:TPixmap = worker.GetResult()
		If nextpix Then
			If worker.Time > 65 Then Print "exceeded time=" + worker.Time + " frames=" + waitcount
			If worker.Time < lowest Then
				lowest = worker.Time
			ElseIf worker.Time > highest Then
				highest = worker.TIme
			EndIf
			' load it and request next
			img = LoadImage(nextpix)
			worker.RequestResult()
			lastwaitcount = waitcount
			waitcount = 0
			count = 0
		EndIf
	EndIf
	waitcount :+ 1 ' just to see if waiting takes longer than 4 frames

Until AppTerminate() Or KeyHit(KEY_ESCAPE)
worker.Free()
End



Type TWorker
	Field CurrPix:TPixmap
	Field NextPix:TPixmap
	Field Signal:TSemaphore
	Field Thread:TThread
	Field HasResult:Int
	Field Running:Int
	Field Time:Int
	
	Method Create:TWorker()
		Self.CurrPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.NextPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.Signal = CreateSemaphore(0)
		Self.Thread = CreateThread( WorkerFn, Self)
		Return Self
	EndMethod
	
	Method Free()
		Terminate()
		CloseSemaphore(Signal)
	EndMethod
	
	Method Terminate()
		While Not CompareAndSwap( Running, True, False)
		Wend
		RequestResult()
		Thread.Wait()
	EndMethod
	
	Method Swap()
		Local pix:TPixmap = CurrPix
		CurrPix = NextPix
		NextPix = pix
	EndMethod
	
	Method RequestResult()
			PostSemaphore(Signal)
	EndMethod
	
	Method GetResult:TPixmap()
		If CompareAndSwap( HasResult, True, False)
			Swap()
			Return CurrPix
		EndIf
	EndMethod
	
	Method Generate( pix:TPixmap)
		Local t:Int = MilliSecs()

		MemClear HeightMap, SizeOf(HeightMap)
		
		Generate_Heightmap()

		Local p:Int Ptr = Int Ptr pix.Pixels
		For Local y% = 0 To C_Height_Map_Size-1
			For Local x% = 0 To C_Height_Map_Size-1
			
				Local col% = Int(HeightMap#[x% + y%*513]) 
				
				If col% < 0 Then col%=0
				If col%>255 Then col%=255
				
				p[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%	
			Next
		Next

		Time = MilliSecs() - t
	EndMethod
	
	Function WorkerFn:Object( data:Object)
		Local worker:TWorker = TWorker(data)
		worker.Running = True
		Repeat
			WaitSemaphore(worker.Signal)
			If CompareAndSwap( worker.Running, False, False) Then Exit
			worker.Generate(worker.NextPix)
			While Not CompareAndSwap( worker.HasResult, False, True); Wend
		Forever
		worker.Running = False
	EndFunction
EndType


Function Generate_Heightmap()
	
		'firstly increment the angles_inc for each NoiseMap Size
		'-------------------------------------------------------
		angles_inc_256# :+ angles_step_256#
		angles_inc_128# :+ angles_step_128#
		angles_inc_64#  :+ angles_step_64#
		angles_inc_32#  :+ angles_step_32#
		angles_inc_16#  :+ angles_step_16#
		angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

	Local AngleMapSize = 256																			'C_Height_Map_Size/2
	
	Local Angle_Map:Float[,]
	Local Angle_Inc:Float
	
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

		Select AngleMapSize
			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
		End Select
		
		For Local y%=0 To AngleMapSize
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map#[ x%, y% ] + Angle_Inc#
				Local cos_angle# = cosine#[ angle% & 2047 ]
			
				NoiseMap#[ x%, y% ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = NoiseMap#[Noise_X,   Noise_Y]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = NoiseMap#[Noise_X+1, Noise_Y]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = NoiseMap#[Noise_X,   Noise_Y+1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = NoiseMap#[Noise_X+1, Noise_Y+1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - cosine#[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		AngleMapSize = AngleMapSize/2
		
	Until AngleMapSize <= 16	' was 1

End Function

dw817

(Posted 2016) [#23]

Just out of curiosity guyz, what minimal code for Blitzmax could there be to make a perfect seamless Perlin Noise screen, not animated, just a drawn field, say 512x512 pixels ?

Brucey

(Posted 2016) [#24]

Stats are interesting here (using grable's code).

For Windows (7, running in a VM on my Mac - VM has 2 cores and 6GB), I get generation at 15/16 (min/max) for legacy BlitzMax. For 64-bit NG, I'm getting 4/5 (min/max). That's quite a bit faster.

Brucey

(Posted 2016) [#25]

After some optimisations, I get 14 avg on legacy, and a constant 4 on NG 64-bit.
Optimisations generally involve working with pointers to the arrays, rather than the arrays themselves - which has some overhead.
Ideally one could also consider tweaking the code to assist the CPU cache.

Strict

Framework brl.glmax2d
Import brl.random
Import brl.standardio
Import brl.threads

Const C_Screen_Width = 1024
Const C_Screen_Depth = 768

Const C_Height_Map_Size = 512

Const C_Noise_Map_Size = (C_Height_Map_Size/2+1)

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ C_Noise_Map_Size , C_Noise_Map_Size ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]


'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Global angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Global angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Global angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Global angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Global angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Global angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next

Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next


'
' BEGIN
'
Graphics( C_Screen_Width, C_Screen_Depth, 0,60, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )
SetBlend ALPHABLEND

Local worker:TWorker = New TWorker.Create()
Local img:TImage

' initiate first request
worker.RequestResult()
Local count:Int
Local waitcount:Int, lastwaitcount:Int
Repeat
	SetColor 0,0,255
	SetScale 1.5,1.5
	If img Then DrawImage img, 0,0

	SetColor 255,0,0
	SetScale 2,2
	DrawText worker.Time + " (" + lastwaitcount + ")", 32,32
	
	Flip
	Cls
	
	' get next pixmap
	count :+ 1
	If count >= 4 Then
		Local nextpix:TPixmap = worker.GetResult()
		If nextpix Then
			If worker.Time > 65 Then Print "exceeded time=" + worker.Time + " frames=" + waitcount
			' load it and request next
			img = LoadImage(nextpix)
			worker.RequestResult()
			lastwaitcount = waitcount
			waitcount = 0
			count = 0
		EndIf
	EndIf
	waitcount :+ 1 ' just to see if waiting takes longer than 4 frames

Until AppTerminate() Or KeyHit(KEY_ESCAPE)
worker.Free()
End



Type TWorker
	Field CurrPix:TPixmap
	Field NextPix:TPixmap
	Field Signal:TSemaphore
	Field Thread:TThread
	Field HasResult:Int
	Field Running:Int
	Field Time:Int
	
	Method Create:TWorker()
		Self.CurrPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.NextPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.Signal = CreateSemaphore(0)
		Self.Thread = CreateThread( WorkerFn, Self)
		Return Self
	EndMethod
	
	Method Free()
		Terminate()
		CloseSemaphore(Signal)
	EndMethod
	
	Method Terminate()
		While Not CompareAndSwap( Running, True, False)
		Wend
		RequestResult()
		Thread.Wait()
	EndMethod
	
	Method Swap()
		Local pix:TPixmap = CurrPix
		CurrPix = NextPix
		NextPix = pix
	EndMethod
	
	Method RequestResult()
			PostSemaphore(Signal)
	EndMethod
	
	Method GetResult:TPixmap()
		If CompareAndSwap( HasResult, True, False)
			Swap()
			Return CurrPix
		EndIf
	EndMethod
	
	Method Generate( pix:TPixmap)
		Local t:Int = MilliSecs()

		MemClear HeightMap, SizeOf(HeightMap)
		
		Generate_Heightmap()

		Local p:Int Ptr = Int Ptr pix.Pixels
		For Local y% = 0 To C_Height_Map_Size-1
			For Local x% = 0 To C_Height_Map_Size-1
			
				Local col% = Int(HeightMap#[x% + y%*513]) 
				
				If col% < 0 Then col%=0
				If col%>255 Then col%=255
				
				p[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%	
			Next
		Next

		Time = MilliSecs() - t
	EndMethod
	
	Function WorkerFn:Object( data:Object)
		Local worker:TWorker = TWorker(data)
		worker.Running = True
		Repeat
			WaitSemaphore(worker.Signal)
			If CompareAndSwap( worker.Running, False, False) Then Exit
			worker.Generate(worker.NextPix)
			While Not CompareAndSwap( worker.HasResult, False, True); Wend
		Forever
		worker.Running = False
	EndFunction
EndType


Function Generate_Heightmap()
	
		'firstly increment the angles_inc for each NoiseMap Size
		'-------------------------------------------------------
		angles_inc_256# :+ angles_step_256#
		angles_inc_128# :+ angles_step_128#
		angles_inc_64#  :+ angles_step_64#
		angles_inc_32#  :+ angles_step_32#
		angles_inc_16#  :+ angles_step_16#
		angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

	Local AngleMapSize = 256																			'C_Height_Map_Size/2
	
	Local Angle_Map:Float Ptr'[,]
	Local Angle_Inc:Float
	
	Local _cosine:Float Ptr = cosine
	
	Local _noiseMap:Float Ptr = NoiseMap
	Local _heightMap:Float Ptr = HeightMap
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

		Select AngleMapSize
			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
		End Select
		
		For Local y%=0 To AngleMapSize

			Local yOff:Int = y% * AngleMapSize
			Local ynOff:Int = y * C_Noise_Map_Size
			
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map[ x% + yOff] + Angle_Inc#
				Local cos_angle# = _cosine[ angle% & 2047 ]
			
				_NoiseMap[ x% + ynOff ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
		
			Local yOff:Int = Noise_Y  * C_Noise_Map_Size
			Local yOff1:Int = (Noise_Y+1)  * C_Noise_Map_Size
	
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = _NoiseMap[Noise_X +   yOff]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = _NoiseMap[Noise_X+1 + yOff]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = _NoiseMap[Noise_X +   yOff1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = _NoiseMap[Noise_X+1 + yOff1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - _cosine[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - _cosine[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						Local off:Int = Hx+Height_X +  (Hy+Height_Y)*513
						
						_HeightMap[off] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		AngleMapSize = AngleMapSize/2
		
	Until AngleMapSize <= 16	' was 1

End Function

grable

(Posted 2016) [#26]

Forgot i was running in powersaving mode ;) Updated my figures above, 12/13.

~~Your sample though, jumps wildly for me Brucey. 10/48. I wonder why that is.~~
~~Was running the wrong sample, so both of them jump wildly. I wonder why its so stable on your end...~~
Your sample Brucey runs at 10/12. And with bmx-ng-x86 i get 3/4 and x64 3/3. Thats is pretty impressive :)

I did some tweaking on my own, but could only reduce it to 10/13. Removing compares and simplifying NoiseMap selection and transfer to pixmap.
I worked on the original though, before i ported it over to threaded one. And there it shaved off 25-30ms, under powersaving mode.

Probably has something to do with cache locality yeah. The thread doesnt have to share its part of the cache with the main thread, which may be why its hard to optimize the threaded version. Except for bmx-ng of course, gcc optimizes well compared to vanilla bmx ;)

EDIT: Juggling too many files at once leads to running the wrong thing :(
EDIT2: Damn windows and its power modes, double :(

original

Strict
SetGraphicsDriver GLMax2DDriver()

Const C_Screen_Width = 1024
Const C_Screen_Depth = 768

Graphics( C_Screen_Width, C_Screen_Depth, 0,60, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )


Const C_Height_Map_Size = 512

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ (C_Height_Map_Size/2+1) , (C_Height_Map_Size/2+1) ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]

'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Const angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Const angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Const angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Const angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Const angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Const angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next

' cached angle types
'-------------------
Global angles_maps:Float[,][] = [ angles_256, angles_128, angles_64, angles_32, angles_16, angles_8 ]
Global angles_maps_sizes:Int[] = [ 256, 128, 64, 32, 16, 8 ]
Global angles_maps_incs:Float Ptr[] = [ Varptr angles_inc_256, Varptr angles_inc_128, Varptr angles_inc_64, Varptr angles_inc_32, Varptr angles_inc_16, Varptr angles_inc_8 ]


Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next


Global Image:TImage = CreateImage( C_Height_Map_Size, C_Height_Map_Size )

'MAIN
'----

Repeat
	MemClear HeightMap, SizeOf(HeightMap)
		
	Local time=MilliSecs()
		
	Generate_Heightmap()
	
	
	Local pixmap:TPixmap = LockImage(Image:TImage)			'extract it's pixmap
	Local pixels:Int Ptr = Int Ptr(pixmap.pixels)		'access pixmap memory area.

'	For Local y% = 0 To C_Height_Map_Size-1
'		For Local x% = 0 To C_Height_Map_Size-1
'		
'			Local col% = Int(HeightMap#[x% + y%*513])
'			
'			If col% < 0 Then
'				col%=0
'			ElseIf col%>255 Then
'				col%=255
'			EndIf
'			
'			pixels[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%
'		Next
'	Next
	Local hmap:Float Ptr = HeightMap, hmapend:Float Ptr = hmap + ((C_Height_Map_Size - 1) * (C_Height_Map_Size - 1))
	Local w:Int
	While hmap < hmapend
			Local col:Int = hmap[0]
			If col > 255 Then col = 255
			pixels[0] = $ff000000 | col Shl 16 | col Shl 8 | col
			pixels :+ 1
			w :+ 1
			If w >= C_Height_Map_Size Then
				hmap :+ 2
				w = 0
			Else
				hmap :+ 1
			EndIf
	Wend	

	UnlockImage(Image:TImage)
	
	time = MilliSecs()-time

	SetColor 0,0,255
	SetScale 1.5,1.5
	DrawImage Image,0,0 								'draw the modified image.
		
	
	
	SetColor 255,0,0
	SetScale 2,2
	DrawText String(time),100,100

	Flip(True)

Until KeyDown(KEY_ESCAPE)

WaitKey()
End


'--------------------------------------------------------------------------------------------------------------------
'--------------------------------------------------------------------------------------------------------------------
Function Generate_Heightmap()
	
		'firstly increment the angles_inc for each NoiseMap Size
		'-------------------------------------------------------
		angles_inc_256# :+ angles_step_256#
		angles_inc_128# :+ angles_step_128#
		angles_inc_64#  :+ angles_step_64#
		angles_inc_32#  :+ angles_step_32#
		angles_inc_16#  :+ angles_step_16#
		angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

'	Local AngleMapSize = 256																			'C_Height_Map_Size/2
	Local AngleMapSize:Int
	Local AngleMapIndex:Int
	
	Local Angle_Map:Float[,]
	Local Angle_Inc:Float
	
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

'		Select AngleMapSize
'			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
'			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
'			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
'			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
'			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
'			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
'		End Select
		Angle_Map = angles_maps[AngleMapIndex]
		AngleMapSize = angles_maps_sizes[AngleMapIndex]
		Angle_Inc = angles_maps_incs[AngleMapIndex][0]
		
		For Local y%=0 To AngleMapSize
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map#[ x%, y% ] + Angle_Inc#
				Local cos_angle# = cosine#[ angle% & 2047 ]
			
				NoiseMap#[ x%, y% ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = NoiseMap#[Noise_X,   Noise_Y]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = NoiseMap#[Noise_X+1, Noise_Y]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = NoiseMap#[Noise_X,   Noise_Y+1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = NoiseMap#[Noise_X+1, Noise_Y+1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - cosine#[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		'AngleMapSize = AngleMapSize/2
		AngleMapIndex :+ 1
		
'	Until AngleMapSize <= 16	' was 1
	Until AngleMapIndex >= angles_maps.Length - 2

End Function

threaded

Strict

Const C_Screen_Width = 1024
Const C_Screen_Depth = 768

Const C_Height_Map_Size = 512

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ (C_Height_Map_Size/2+1) , (C_Height_Map_Size/2+1) ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]


'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Global angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Global angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Global angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Global angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Global angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Global angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next

' cached angle types
'-------------------
Global angles_maps:Float[,][] = [ angles_256, angles_128, angles_64, angles_32, angles_16, angles_8 ]
Global angles_maps_sizes:Int[] = [ 256, 128, 64, 32, 16, 8 ]
Global angles_maps_incs:Float Ptr[] = [ Varptr angles_inc_256, Varptr angles_inc_128, Varptr angles_inc_64, Varptr angles_inc_32, Varptr angles_inc_16, Varptr angles_inc_8 ]



Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next


'
' BEGIN
'
Graphics( C_Screen_Width, C_Screen_Depth, 0,60, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )
SetBlend ALPHABLEND

Local worker:TWorker = New TWorker.Create()
Local img:TImage

' initiate first request
worker.RequestResult()
Local count:Int
Local waitcount:Int, lastwaitcount:Int
Local lowest:Int = $FFFF, highest:Int
Repeat
	SetColor 0,0,255
	SetScale 1.5,1.5
	If img Then DrawImage img, 0,0

	SetColor 255,0,0
	SetScale 2,2
	DrawText worker.Time + " (" + lastwaitcount + ")", 32,32
	
	Flip
	Cls
	
	' get next pixmap
	count :+ 1
	If count >= 4 Then
		Local nextpix:TPixmap = worker.GetResult()
		If nextpix Then
			If worker.Time > 65 Then Print "exceeded time=" + worker.Time + " frames=" + waitcount
			If worker.Time < lowest Then
				lowest = worker.Time
			ElseIf worker.Time > highest Then
				highest = worker.TIme
			EndIf
			' load it and request next
			img = LoadImage(nextpix)
			worker.RequestResult()
			lastwaitcount = waitcount
			waitcount = 0
			count = 0
		EndIf
	EndIf
	waitcount :+ 1 ' just to see if waiting takes longer than 4 frames
	
	
Until AppTerminate() Or KeyHit(KEY_ESCAPE)
worker.Free()
Print "lowest=" + lowest
Print "highest=" + highest
End



Type TWorker
	Field CurrPix:TPixmap
	Field NextPix:TPixmap
	Field Signal:TSemaphore
	Field Thread:TThread
	Field HasResult:Int
	Field Running:Int
	Field Time:Int
	
	Method Create:TWorker()
		Self.CurrPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.NextPix = CreatePixmap( C_Height_Map_Size, C_Height_Map_Size, PF_RGBA8888)
		Self.Signal = CreateSemaphore(0)
		Self.Thread = CreateThread( WorkerFn, Self)
		Return Self
	EndMethod
	
	Method Free()
		Terminate()
		CloseSemaphore(Signal)
	EndMethod
	
	Method Terminate()
		While Not CompareAndSwap( Running, True, False)
		Wend
		RequestResult()
		Thread.Wait()
	EndMethod
	
	Method Swap()
		Local pix:TPixmap = CurrPix
		CurrPix = NextPix
		NextPix = pix
	EndMethod
	
	Method RequestResult()
			PostSemaphore(Signal)
	EndMethod
	
	Method GetResult:TPixmap()
		If CompareAndSwap( HasResult, True, False)
			Swap()
			Return CurrPix
		EndIf
	EndMethod
	
	Method Generate( pix:TPixmap)
		Local t:Int = MilliSecs()

		MemClear HeightMap, SizeOf(HeightMap)
		
		Generate_Heightmap()

'		Local p:Int Ptr = Int Ptr pix.Pixels
'		For Local y% = 0 To C_Height_Map_Size-1
'			For Local x% = 0 To C_Height_Map_Size-1
'			
'				Local col% = Int(HeightMap#[x% + y%*513]) 
'				
'				If col% < 0 Then col%=0
'				If col%>255 Then col%=255
'				
'				p[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%	
'			Next
'		Next
		Local pixels:Int Ptr = Int Ptr pix.pixels
		Local hmap:Float Ptr = HeightMap, hmapend:Float Ptr = hmap + ((C_Height_Map_Size - 1) * (C_Height_Map_Size - 1))
		Local w:Int
		While hmap < hmapend
				Local col:Int = hmap[0]
				If col > 255 Then col = 255
				pixels[0] = $ff000000 | col Shl 16 | col Shl 8 | col
				pixels :+ 1
				w :+ 1
				If w >= C_Height_Map_Size Then
					hmap :+ 2
					w = 0
				Else
					hmap :+ 1
				EndIf
		Wend	

		Time = MilliSecs() - t
	EndMethod
	
	Function WorkerFn:Object( data:Object)
		Local worker:TWorker = TWorker(data)
		worker.Running = True
		Repeat
			WaitSemaphore(worker.Signal)
			If CompareAndSwap( worker.Running, False, False) Then Exit
			worker.Generate(worker.NextPix)
			While Not CompareAndSwap( worker.HasResult, False, True); Wend
		Forever
		worker.Running = False
	EndFunction
EndType


Function Generate_Heightmap()
	
		'firstly increment the angles_inc for each NoiseMap Size
		'-------------------------------------------------------
		angles_inc_256# :+ angles_step_256#
		angles_inc_128# :+ angles_step_128#
		angles_inc_64#  :+ angles_step_64#
		angles_inc_32#  :+ angles_step_32#
		angles_inc_16#  :+ angles_step_16#
		angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

'	Local AngleMapSize = 256																			'C_Height_Map_Size/2
	Local AngleMapSize:Int
	Local AngleMapIndex:Int
	
	Local Angle_Map:Float[,]
	Local Angle_Inc:Float
	
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

'		Select AngleMapSize
'			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
'			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
'			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
'			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
'			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
'			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
'		End Select
		Angle_Map = angles_maps[AngleMapIndex]
		AngleMapSize = angles_maps_sizes[AngleMapIndex]
		Angle_Inc = angles_maps_incs[AngleMapIndex][0]
		
		For Local y%=0 To AngleMapSize
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map#[ x%, y% ] + Angle_Inc#
				Local cos_angle# = cosine#[ angle% & 2047 ]
			
				NoiseMap#[ x%, y% ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = NoiseMap#[Noise_X,   Noise_Y]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = NoiseMap#[Noise_X+1, Noise_Y]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = NoiseMap#[Noise_X,   Noise_Y+1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = NoiseMap#[Noise_X+1, Noise_Y+1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - cosine#[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		'AngleMapSize = AngleMapSize/2
		AngleMapIndex :+ 1
		
'	Until AngleMapSize <= 16	' was 1
	Until AngleMapIndex >= angles_maps.Length - 2

End Function

grable

(Posted 2016) [#27]

I figured out why it was jumping so wildly, seems even the Balanced power mode in windows likes to jump like crazy between low and max mhz of the cpu.
Only High Performance gives a stable time, same as Brucey.
Of course the CPU then runs at max mhz all the time, so will generate more heat. But this explains the wild jumping at least.

And here i was under the impression that if under heavy load it would stay at max mhz (under Balanced), either windows sucks at it or running that sample wasnt heavy enough for it to care :/

Brucey

(Posted 2016) [#28]

Of course the CPU then runs at max mhz all the time

Sticking a Delay 1 in here may help :

While Not CompareAndSwap( worker.HasResult, False, True); Wend

Otherwise it will run at 100% until hasResult is reset (which could be for 4 * 16ms or so?).

grable

(Posted 2016) [#29]

Otherwise it will run at 100% until hasResult is reset (which could be for 4 * 16ms or so?).

Its pretty much guaranteed to be False. Since one has to GetResult() before RequestResult().

I was talking in general about the CPU, meaning High Performance makes the cpu stay at 4000mhz (max without turbo) no matter the load.
But after running with it like that for a while, it only increased 4 degrees(C) when idling so not a big deal after all ;)

I dont have the mightiest of coolers, and under heavy load for several hours it can hit 68.
Which was why i was a bit worried.. Should really get a better cooler though, one with with a way larger heatsink hehe.

Brucey

(Posted 2016) [#30]

Its pretty much guaranteed to be False

Oh yeah. Never mind ;-)

My 6500k is buried behind a 5k screen. I've never even heard the fan yet... although after some more months gathering dust I expect to hear it more often.

Derron

(Posted 2016) [#31]

Brucey:

I've never even heard the fan yet

You know - the older, the less you hear.
Uhm ... seriously: you just wanted to use the chance to mention that 5k screen ... :-p

I assume GCC optimizes out these local variables (as they are only needed once each, which means we could write them all in one big formula)

						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#

So this might shape of some nano seconds too (in vanilla, when done as 1 formula).

bye
Ron

col	(Posted 2016) [#32]

@grable
I've always found the power options too misleading for profiling too so now I have that power profile set to 'performance' and adjust the power saving options within that profile.

And my lonesome take on it using a multithread queue...

intel i5 3570 3.4ghz
gfx 750

Vanilla 14ms
NG 5ms

Strict

Global ImageCreationTime:Int

' Image pool
Type TImagePool
	Field _images:TImage[]
	Field _index:Int
	Field _count:Int
	
	Method Create:TImagePool(Count:Int,Width:Int,Height:Int)
		_images = New TImage[Count]
		
		For Local i:Int = 0 Until Count
			_images[i] = CreateImage(Width,Height)
		Next

		_count = Count - 1
		Return Self
	EndMethod
	
	Method GetImage:TImage()
		Local image:TImage = _images[_index]

		_index :+ 1
		If _index > _count _index = 0
	
		Return image
	EndMethod
EndType

' Image Queue
Type TImageQueue
	Field _head:Int = 0
	Field _tail:Int = 0
	
	Field _slots:TImage[]
	Field _count:Int
	
	Field _enqsem:TSemaphore
	Field _mutex:TMutex
	
	Method Create:TImageQueue(Count:Int)
		_slots = New TImage[Count]

		_count = count - 1
		_enqsem = CreateSemaphore(_count)
		_mutex = CreateMutex()

		Return Self
	EndMethod

	' only the second thread will ever call this method
	Method Enqueue(Image:TImage)
		' tell the semaphore to decrement its value
		' if the semaphore has a count of 0 then wait here
		WaitSemaphore _enqsem
		
		LockMutex(_mutex) ' use a mutex here because other threads are accessing the _tail variable via the Dequeue method
		' if a thread is here then there is guaranteed an empty slot in the queue
		_slots[_tail] = Image
		
		_tail :+ 1
		If _tail > _count _tail = 0
		
		UnlockMutex(_mutex)
	EndMethod
	
	
	' Only the main thread will ever call this method
	' if an image is available then return it or return null
	Method Dequeue:TImage()
		' are there any images to get?

		' use a mutex to prevent accessing _tail from 2 threads at once
		' if the queue is empty then return null without waiting
		LockMutex(_mutex)
		If _head = _tail
			UnlockMutex(_mutex)
			Return Null
		EndIf
		UnlockMutex(_mutex)
	
		' get the image and move the _head pointer
		Local image:TImage = _slots[_head]
		_head = (_head + 1) & _count
		
	
		PostSemaphore _enqsem ' tell the semaphore to increment - meaning its ok to now enqueue an image
		Return image
	EndMethod
EndType

' Thread to create image data and put them into a queue
Function CreateImageThread:Object(data:Object)

	' Its perfectly safe to read the variable here without locking - you need to understand how the cpu reads memory
	' Read up on ReadModifyWrite as to why I don't bother with an atomic operation here.
	While KeepThreadRunning

		' take image from the pool and work on it
		Local image:TImage = ImagePool.GetImage()
		' This example does nothing here as the images are 'premade' already
		
		' Do work on the image	
		Local Time:Int = MilliSecs()
		Generate_Heightmap()

		Local pixmap:TPixmap = LockImage(Image)				'extract it's pixmap
		Local pixels:Int Ptr = Int Ptr(pixmap.pixels)		'access pixmap memory area.
	
		For Local y% = 0 To C_Height_Map_Size-1
			For Local x% = 0 To C_Height_Map_Size-1
			
				Local col% = Int(HeightMap#[x% + y%*513]) 
				
				If col% < 0 Then col%=0
				If col%>255 Then col%=255
				
				pixels[ x% + y%*C_Height_Map_Size% ] = $ff000000 | col% Shl 16 | col% Shl 8 | col%
			Next
		Next
		
		ImageCreationTime = MilliSecs() - Time
	
		'UnlockImage(Image:TImage)			' Does absolutely nothing!

		' Add this image to the queue
		' This thread will wait inside the Enqueue method if the queue is full
		ImageQueue.Enqueue(image)
	Wend

	Print "Finished"
EndFunction












Const C_Screen_Width = 1024
Const C_Screen_Depth = 768

Graphics( C_Screen_Width, C_Screen_Depth, 0,0, GRAPHICS_BACKBUFFER | GRAPHICS_ALPHABUFFER )

Const C_Height_Map_Size = 512
Const C_Images_To_Cache = 4

Global HeightMap#[ (C_Height_Map_Size+1) * (C_Height_Map_Size+1) ]

Global NoiseMap#[ (C_Height_Map_Size/2+1) , (C_Height_Map_Size/2+1) ]


Global C_Max_Height# = 8.0			' initial
Global C_Multiplier# = 2
Global C_Fraction#	 = 0.35



Global angles_256#[ 256+1, 256+1 ]
Global angles_128#[ 128+1, 128+1 ]
Global angles_64# [  64+1,  64+1 ]
Global angles_32# [  32+1,  32+1 ]
Global angles_16# [  16+1,  16+1 ]
Global angles_8#  [   8+1,   8+1 ]


'setup the angles inc
'--------------------
Global angles_inc_256# = 0.0	; Global angles_step_256# = (32.0 / 360.0) * 512.0 * 4.0
Global angles_inc_128# = 0.0	; Global angles_step_128# = (16.0 / 360.0) * 512.0 * 4.0
Global angles_inc_64#  = 0.0	; Global angles_step_64#  = (8.0  / 360.0) * 512.0 * 4.0
Global angles_inc_32#  = 0.0	; Global angles_step_32#  = (4.0  / 360.0) * 512.0 * 4.0
Global angles_inc_16#  = 0.0	; Global angles_step_16#  = (2.0  / 360.0) * 512.0 * 4.0
Global angles_inc_8#   = 0.0	; Global angles_step_8#   = (1.0  / 360.0) * 512.0 * 4.0

'fill initial angles
'-------------------
For Local x%=0 To 256    ; For Local y%=0 To 256    ; angles_256#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To 128    ; For Local y%=0 To 128    ; angles_128#[ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  64    ; For Local y%=0 To  64    ; angles_64# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  32    ; For Local y%=0 To  32    ; angles_32# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To  16    ; For Local y%=0 To  16    ; angles_16# [ x%, y% ] = Rnd(0,512*4) ; Next ; Next
For Local x%=0 To   8    ; For Local y%=0 To   8    ; angles_8#  [ x%, y% ] = Rnd(0,512*4) ; Next ; Next



Global cosine#[ 512*4 ]
For Local i% = 0 To 512*4-1
	cosine#[i%] = (Cos( (i%/(512.0*4)) * 360.0) + 1.0) / 2.0
Next
Global ImagePool:TImagePool = New TImagePool.Create(C_Images_To_Cache+1,C_Height_Map_Size,C_Height_Map_Size)
Global ImageQueue:TImageQueue = New TImageQueue.Create(C_Images_To_Cache)


' simple switch to end the thread nicely
Global KeepThreadRunning:Int = True

Graphics C_Screen_Width,C_Screen_Depth

' create a thread that manipulates image data and stores them in the queue ready for the main thread
Local Worker:TThread = CreateThread(CreateImageThread,Null)










' MAIN LOOP
Global currtime:Int = MilliSecs()
Global time:Int = 0
Global framedelay:Int
Local NewImage:TImage

While Not KeyDown(KEY_ESCAPE)
	Cls
	
	' For this demo create a counter so we now every 4th frame
	If framedelay = 0	' each time the framedelay loops back to 0 then we get an image from the queue
		currtime = MilliSecs()
		
		Local image:TImage = ImageQueue.Dequeue()
		If image NewImage = image
		
		time = MilliSecs() - currtime
	EndIf
		
	' loop 0 to 3 then back to 0 - ie 4 frames at 60fps - 0 comes around @ 15fps - use delta timing for differing frame rates
	framedelay = (framedelay + 1) & 3

	' display the image - the one from previous 3 frames if need be
	If NewImage
		SetColor 0,0,255
		SetScale 1.5,1.5
		DrawImage NewImage,0,0
	EndIf
	
	SetColor 200,25,25
	SetScale 1,1
	DrawText "Time taken to create the image in 2nd thread: " + ImageCreationTime  + "ms",20,20
	DrawText "Cost to the main thread of the generated image: " + time + "ms",20,35
	
	Flip
Wend

' You need to remove an item from the queue because the 2nd thread *could* be sleeping in the semaphore
ImageQueue.Dequeue()

' Read up on ReadModifyWrite as to why I don't bother with an atomic operation here.
KeepThreadRunning = False
WaitThread Worker










'--------------------------------------------------------------------------------------------------------------------
'--------------------------------------------------------------------------------------------------------------------
Function Generate_Heightmap()
	For Local Y = 0 To C_Height_Map_Size
		For Local X = 0 To C_Height_Map_Size
			HeightMap#[ X + Y*513 ] = 0
		Next
	Next
	
	'firstly increment the angles_inc for each NoiseMap Size
	'-------------------------------------------------------
	angles_inc_256# :+ angles_step_256#
	angles_inc_128# :+ angles_step_128#
	angles_inc_64#  :+ angles_step_64#
	angles_inc_32#  :+ angles_step_32#
	angles_inc_16#  :+ angles_step_16#
	angles_inc_8#   :+ angles_step_8#

	Local max_height# = C_Max_Height#, min_height#, dif_height#

	Local AngleMapSize = 256		'C_Height_Map_Size/2
	
	Local Angle_Map:Float[,]
	Local Angle_Inc:Float
	
	Repeat
		max_height# :* C_Multiplier#
		min_height# = max_height# * C_Fraction#
		dif_height# = max_height# - min_height#

		Select AngleMapSize
			Case 256	; Angle_Map = angles_256#	; Angle_Inc# = angles_inc_256#
			Case 128	; Angle_Map = angles_128# 	; Angle_Inc# = angles_inc_128#
			Case  64	; Angle_Map = angles_64# 	; Angle_Inc# = angles_inc_64#
			Case  32	; Angle_Map = angles_32#  	; Angle_Inc# = angles_inc_32#
			Case  16	; Angle_Map = angles_16#  	; Angle_Inc# = angles_inc_16#
			Case   8	; Angle_Map = angles_8#  	; Angle_Inc# = angles_inc_8#
		End Select
		
		For Local y%=0 To AngleMapSize
			For Local x%=0 To AngleMapSize
			
				Local angle% 	 = Angle_Map#[ x%, y% ] + Angle_Inc#
				Local cos_angle# = cosine#[ angle% & 2047 ]
			
				NoiseMap#[ x%, y% ] = (cos_angle# * dif_height#) + min_height#
			Next
		Next

	
		' Calculate the diffrence in scale between the noisemap And the heightmap.		
		Local ScaleDifference = C_Height_Map_Size / AngleMapSize
		
		' Calculate how large of steps across the noise map we need To take For each pixel of the heightmap.
		Local StepSize# = 1.0 / Float(ScaleDifference)

		' Stretch the noise map over the heightmap using bilinear filtering.
		
		For Local Noise_Y = 0 To AngleMapSize-1
			For Local Noise_X = 0 To AngleMapSize-1
			
				Local N1# = NoiseMap#[Noise_X,   Noise_Y]							'Angle_Map#[Noise_X,   Noise_Y]  
				Local N2# = NoiseMap#[Noise_X+1, Noise_Y]							'Angle_Map#[Noise_X+1, Noise_Y]  
				Local N3# = NoiseMap#[Noise_X,   Noise_Y+1]							'Angle_Map#[Noise_X,   Noise_Y+1]
				Local N4# = NoiseMap#[Noise_X+1, Noise_Y+1]							'Angle_Map#[Noise_X+1, Noise_Y+1]
			
				Local Hx = Noise_X*ScaleDifference
				Local Hy = Noise_Y*ScaleDifference
			
				Local Iy# = 0
				
				For Local Height_Y = 0 To ScaleDifference-1

					Local ICy# = 1.0 - cosine#[ Int(Iy# * 1024.0 ) ]						'Local ICy# = 1.0 - ((Cos(Iy#*180.0) + 1.0) / 2.0)

					Local Ix# = 0	
							
					For Local Height_X = 0 To ScaleDifference-1
				
						Local ICx# = 1.0 - cosine#[ Int(Ix# * 1024.0 ) ]					'Local ICx# = 1.0 - ((Cos(Ix#*180.0) + 1.0) / 2.0)

						Local Na# = N1#*(1.0-ICx#)
						Local Nb# = N2#*ICx#
						Local Nc# = N3#*(1.0-ICx#)
						Local Nd# = N4#*ICx#
						
						HeightMap#[Hx+Height_X +  (Hy+Height_Y)*513] :+ (Na#+Nb#)*(1.0-ICy#) + (Nc#+Nd#)*ICy#
						
						Ix# :+ StepSize#
					Next
					
					Iy# :+ StepSize#	
				Next
		
			Next
			
		Next
		
		' Reduce the frequency of the noise by half. 				
		AngleMapSize = AngleMapSize/2
		
	Until AngleMapSize <= 16	' was 1
End Function

Derron

(Posted 2016) [#33]

Col's code:
Vanilla: ~44
NG: ~30

PC: AMD (LLano quadcore, some years old :-))
OS: Linux Mint 64bit
GCC: 4.8.4

@Print
Doesn't "print" delay things a bit ?

Also: you should avoid "print" from within threads - as they might interfer with other print-calls. In the example above this is not used "critically" but I just want to make you aware of this.

bye
Ron

col	(Posted 2016) [#34]

@Print

As you know, not in this example as it's outside the code path being timed and there's plenty of time left in that thread, but yes writing to a real console window is dog slow and not thread safe, however within the max editor is seems to be very fast here. Yes your point is very valid.

Brucey

(Posted 2016) [#35]

Vanilla: ~44
NG: ~30

As you can see, GCC's optimisations work better on more modern architectures. In this example an average of x3, which is not insignificant.

col	(Posted 2016) [#36]

I've updated my example to show the image generation time taken in the 2nd thread, and also the cost to the main thread for that image - ultimately the main thread can carry on about its business with as-good-as zero cost.

As you can see, GCC's optimisations work better on more modern architectures. In this example an average of x3, which is not insignificant.

On top of which gcc is also outputting some simd vector optimized code in places allowing it to blow vanilla bmax out of the water - nice :-)

Brucey

(Posted 2016) [#37]

gcc is also outputting some simd vector optimized code in places

NG's array data is now 16-byte aligned. Dunno if that helps any?

JBR	(Posted 2016) [#38]

Thanks for the help!

Looks like I should be writing code in 'NG' - what is it?

Newbie question - why does the time taken vary so much from 11 to 46 on my PC?

Jim

grable

(Posted 2016) [#39]

why does the time taken vary so much from 11 to 46 on my PC?

See my previous posts concerning power saving in Windows. Might be related.

JBR	(Posted 2016) [#40]

Hi,

Seems like NG is a bit tricky to set up!

I decided to use col code as it was a bit neater in the main code bit.

It's nice seeing it run without any extra processing in main loop.

Thanks everyone who helped.

Jim.