Copying array 2x slower than CopyBank?

BlitzMax Forums/BlitzMax Programming/Copying array 2x slower than CopyBank?

sswift(Posted 2009) [#1]
Have I done something wrong here, or is copying an array just really slow?

SuperStrict

Local T0%, T1%, T2%, T3%
Local Bank1:TBank = CreateBank(40000*4)
Local Bank2:TBank = CreateBank(40000*4)
Local Pixels1:Float[40000]	
Local Pixels2:Float[40000]
Local Ptr1:Byte Ptr, Ptr2:Byte Ptr
Local Loop%, Loops% = 10000

Ptr1 = MemAlloc(40000*4) 
Ptr2 = MemAlloc(40000*4)

T0 = MilliSecs()

	For Loop = 1 To Loops
		CopyBank(Bank1, 0, Bank2, 0, 40000*4)
	Next

T1 = MilliSecs()

	For Loop = 1 To Loops
		Pixels1 = Pixels2[..]
	Next
	
T2 = MilliSecs()

	For Loop = 1 To Loops
		MemCopy(Ptr1, Ptr2, 40000*4)
	Next
	
T3 = MilliSecs()

Print "CopyBank = " + (T1-T0)
Print "Copy Array = " + (T2-T1)
Print "MemCopy = " + (T3-T2)



Zeke(Posted 2009) [#2]
yes. Pixels1=Pixels2[..] is slow. but,

For Local i:Int = 0 Until Pixels2.length
	Pixels1[i] = Pixels2[i]
Next

is fast.


Floyd(Posted 2009) [#3]
A slice is a new array.

		Pixels1 = Pixels2[..]

This allocates memory for a new array, copies Pixels2[] into it, points Pixels1 at the new memory. The old memory used by Pixels1 is now unused and can be garbage collected.

I'm surprised it runs as fast as it does. Here it is with garbage retained.

Local Pixels1:Float[100000]	
Local Pixels2:Float[100000]

GCSuspend

For n = 1 To 20
	Pixels1 = Pixels2[..]
	Print GCMemAlloced()
Next



sswift(Posted 2009) [#4]
Zeke:

Afraid not.

It is faster, but it's still twice as slow as copying a bank:

SuperStrict

Local T0%, T1%, T2%, T3%
Local Bank1:TBank = CreateBank(40000*4)
Local Bank2:TBank = CreateBank(40000*4)
Local Pixels1:Float[40000]	
Local Pixels2:Float[40000]
Local Ptr1:Byte Ptr, Ptr2:Byte Ptr
Local Loop%, Loops% = 10000

Ptr1 = MemAlloc(40000*4) 
Ptr2 = MemAlloc(40000*4)

T0 = MilliSecs()

	For Loop = 1 To Loops
		CopyBank(Bank1, 0, Bank2, 0, 40000*4)
	Next

T1 = MilliSecs()
	
	Local Loop2%

	For Loop = 1 To Loops
		For Loop2 = 0 Until 40000
			Pixels1[Loop2] = Pixels2[Loop2]
		Next
	Next
	
T2 = MilliSecs()

	For Loop = 1 To Loops
		MemCopy(Ptr1, Ptr2, 40000*4)
	Next
	
T3 = MilliSecs()

Print "CopyBank = " + (T1-T0)
Print "Copy Array = " + (T2-T1)
Print "MemCopy = " + (T3-T2)


It is faster than doing a slice (which was actually 2.5x slower, not 2x like I originally said) but it's still terribly slow.

This isn't too surprising actually, come to think of it, because we're coping 32bit floats here, whereas Memcopy which the other two examples use, is likely copying 64bits at a time or something. Or perhaps it uses an unrolled loop. I imagine using an unrolled loop on the array might speed it up a bit. But it's really not worth all the effort to use an array instead of memalloc.

[edit]

Yep. Unrolling the loop 4x results in array copying going almost as fast as memcopy.


Evil Roy Ferguson(Posted 2009) [#5]
Most modern CPUs provide instructions to quickly copy large amounts of memory at a time, and implementations of standard library's memcpy are almost certainly going to take advantage of them instead of using a loop.

Remember, though, that arrays are objects and are therefore implicitly convertible to and from byte pointers.

You might find
For Loop = 1 To Loops
    MemCopy(Pixels1, Pixels2, 40000*4)
Next
	
T4 = MilliSecs()


more legible than the alternatives. It should be tied in speed with MemCopy'ing on the byte pointers, and you'll still be able to access the arrays with Pixels1[index], etc., which is almost certainly more legible than messing with peeks or dereferencing byte pointers as float pointers.