Comparing Arrays....?

BlitzMax Forums/BlitzMax Programming/Comparing Arrays....?

Streaksy(Posted February) [#1]
Hi. There is so much about BMax that I'm sure I don't know yet. Is there a fast way to check if the contents of one array is identical to another? I do basically this:

For X=0 to 99
For Y=0 to 99
If Array1[x,y] <> Array2[x,y] then return false
Next
Next
Return True

But thinking about it, and knowing there is array notation I don't know about, is there a more CPU efficient way? It would be nice news if there was because in this little project I've started, such a comparison is the main time-taker in the main loop.

Cheers. :D


Midimaster(Posted February) [#2]
Sure, that this loop causes your time problems? Your sample needs <0.2msec time!


Brucey(Posted February) [#3]
An array of 100 x 100 is not going to take long to test. You probably don't need to worry about it.


Derron(Posted February) [#4]
Your test is failing if you want to make sure that array1 and array2 contain totally the same.

you need to prepend:

'not equal if one array is not existing - or lengths differ
if (Array1=null) <> (Array2=null) or Array1.length <> Array2.length then return False


As you use a multi-dim-array you might need to check for dimensions too...


So how to speed up things?
Keyword here is "cache". It only works if you have a "governor" taking care of the content in Array1 and Array2 - so every modification of the array - needs to be done by a manager/governor.
Principle is: you have a variable stating if array1 is modified. Call it "array1Modified:int = False".

When adding, removing to ... or manipulating the array in other ways, set "array1Modified = True"

So this allows you to know right before your loops if the array is the same as "last time".

Ok, so how to use it?
Assume we got this manager and now have "array1Modified" and "array2Modified".

if array1Modified or array2Modified
'do your loops
endif

...
'once everything is done, set your helper var to "handled"
array1Modified = False
array2Modified = False






bye
Ron


Brucey(Posted February) [#5]
Here's an example, simulating calling it at 60fps over 1 minute :
SuperStrict

Framework BRL.Standardio

Extern
?bmxng
Function memcmp:Int(p1:Byte Ptr, p2:Byte Ptr, size:Int)="int memcmp(const void *,const void *,size_t)!"
?Not bmxng
Function memcmp:Int(p1:Byte Ptr, p2:Byte Ptr, size:Int)
?
End Extern

Const iterations:Int = 16 * 60 * 60
Const x:Int = 100
Const y:Int = 100

Local array1:Int[x,y]
Local array2:Int[x,y]
Local res:Int

Local start:Int = MilliSecs()
For Local i:Int = 0 Until iterations
	res = loop1test(array1, array2, x, y)
Next
time(start, "loop1 same", res)

start = MilliSecs()
For Local i:Int = 0 Until iterations
	res = loop2test(array1, array2, x, y)
Next
time(start, "loop2 same", res)

start = MilliSecs()
For Local i:Int = 0 Until iterations
	res = cmptest(array1, array2, x * y * 4)
Next
time(start, "cmp same", res)

array2[x - 1, y - 1] = 1

start = MilliSecs()
For Local i:Int = 0 Until iterations
	res = loop1test(array1, array2, x, y)
Next
time(start, "loop1 diff", res)

start = MilliSecs()
For Local i:Int = 0 Until iterations
	res = loop2test(array1, array2, x, y)
Next
time(start, "loop2 diff", res)

start = MilliSecs()
For Local i:Int = 0 Until iterations
	res = cmptest(array1, array2, x * y * 4)
Next
time(start, "cmp diff", res)


Function loop1test:Int(array1:Int[,], array2:Int[,], xLen:Int, yLen:Int)
	For Local x:Int = 0 Until xLen
		For Local y:Int = 0 Until yLen
			If array1[x, y] <> array2[x, y] Then
				Return False
			End If
		Next
	Next
	Return True
End Function

Function loop2test:Int(array1:Int[,], array2:Int[,], xLen:Int, yLen:Int)
	Local a1:Int Ptr = array1
	Local a2:Int Ptr = array2 
	
	For Local i:Int = 0 Until xLen * yLen
		If a1[i] <> a2[i] Then
			Return False
		End If
	Next
	
	Return True
End Function


Function cmptest:Int(p1:Byte Ptr, p2:Byte Ptr, size:Int)
	Return memcmp(p1, p2, size) = 0
End Function

Function time(start:Int, text:String, result:Int)
	Local diff:Int = MilliSecs() - start
	Print text + " (" + result + ") : " + diff
End Function


There are 3 different comparisons, an x,y compare, a single loop compare, and one with memcmp.

Some results from an i7 on Windows 10:

Legacy BlitzMax 1.50 :

loop1 same (1) : 528
loop2 same (1) : 306
cmp same (1) : 240
loop1 diff (0) : 526
loop2 diff (0) : 306
cmp diff (0) : 238


BlitzMax NG

loop1 same (1) : 245
loop2 same (1) : 214
cmp same (1) : 0
loop1 diff (0) : 240
loop2 diff (0) : 213
cmp diff (0) : 0


528 is half a second over the course of 1 minute. A single loop with BlitzMax 1.50 is slightly faster. And memcmp() takes about half the time of your example.

Interestingly for NG, with this data, the compiler/runtime does some sorcery and the repeated calls to memcmp() with the same data results in it deciding it doesn't need to retest the compare since the last time... giving a total time of 0 milliseconds.
For the other loop compares, they take almost the same time overall, and even finish faster than 1.50's use of memcmp().

Take (silly) test results with a pinch of salt ;-)


Derron(Posted February) [#6]
Just hopping in to gain some knowledge about internals: I am not sure why "memcpm" is really doing what it should.

I understand "memcpm" is returning whether two memory blocks contain the same data. So "0" means: both contain the same.

But what I am not sure about (not able to test now) is why you can add 100 objects to two _different_ arrays. So while the pointers to the 100 objects might be the same, the properties of the array will differ. So arr1 has a memory address of A and the arr2 a memory address of B. At A2 we find "length"-property of arr1 and at B2 the length for arr2.
So why are both memory blocks identical in your tests? Does the Pointer to the array lead directly to the connected data and skips "properties" ?



Another thing:
If the content of the array are "strings" ... would memcpm still work?
A arr1[x,y] = arr2[x,y] would still be valid even with these two strings being stored at different memory locations.


bye
Ron


Brucey(Posted February) [#7]
Does the Pointer to the array lead directly to the connected data and skips "properties" ?

Yes.

If the content of the array are "strings" ... would memcpm still work?

An array of Strings, is an array of pointers to memory locations where String data is stored.
memcmp is only useful for primitive data types or raw data.


Derron(Posted February) [#8]
A pity but seems to behave as I expected.


Seems "cache" is the fastest one - if entries do not change every execution (ohh how unexpected ;-)).

bye
Ron


Streaksy(Posted February) [#9]
Derron:
It already has that overall HasBeenUpdated=1 thing. :D

Brucey:
Interesting. Thanks for the tim check code. I implemented it and it actually takes a lot longer because (and I should have said) it's actually checking several arrays with the same dimensions...

For X=1 to 10000
For Y=1 to 10000
If GridVar1a[x,y] <> GridVar1a[x,y] then Return false
If GridVar1b[x,y] <> GridVar1b[x,y] then Return false
If GridVar1c[x,y] <> GridVar1c[x,y] then Return false
If GridVar1d[x,y] <> GridVar1d[x,y] then Return false
If GridVar1e[x,y] <> GridVar1e[x,y] then Return false
Next
Next
Return true

The 1-100 in my first example was just arbitrary for the illustration of the problem. And I'm not too bothered about framerate because the program already gets 13000 fps on my mediocre PC with vsync off. Actually I tried that atlas image thing you suggested on my last forum post which multiplied the framerate by like 5. The thing is though thats great for the client but this current array comparison thing is intended for server code and it will be doing this per user, so squeezing out extra FPS could be a life-saver.

Thinking about it.... the variables being compared are bytes. Would it be faster if they were integers? If so it might be worth multiplying the per-user grid memory by 4 and just having integers...


Brucey(Posted February) [#10]
and I should have said

Yes, it will help if the code you provide is fairly accurate ;-)

For example :
GridVar1a[x,y] <> GridVar1a[x,y] 

is comparing the same thing... so even this code doesn't really help :-p


Streaksy(Posted February) [#11]
Gawd, I make mistakes like that while actually coding as well. Cheers again Brucey.


Midimaster(Posted February) [#12]
If you have fields of 100.000.000 elements, I would suggest to keep track of the changes when happen instead of checking all 100.000.000 elements later.

instead of...
...
If Bomb=TRUE
    GridA[x,y]=15
Endif
....
For X=1 to 10000
     For Y=1 to 10000
          If GridA[x,y] <> GridB[x,y] then Return false
     Next
Next


i would do something like...
...
Type Pos
     Global List:TList
     Field X%,Y%
End Type
....
If Bomb=TRUE
    ChangeGrid(x,y,15)
Endif
....
For local loc:Pos = EachIn Pos.List
     If GridA(loc.X,loc.Y)<>GridB(loc.X,loc.Y) then Return false
Next
Pos.List.Clear

.....

Function ChangeGrid(x%,y%,value%)
     GridA[x,y]=value
     Local loc:Pos=New Pos
     Pos.X=x
     Pos.Y=Y
     Pos.List.AddLast Pos
End function

To check you can process the list. With perhaps 1000 entries it is much faster than checking all 100.000.000 elements


TomToad(Posted February) [#13]
Many ways to do this. One way is to use a TMap to track differences. Every time you change a value in the array, check the corresponding cell in the other array. If they do not match, you can enter a key into the map. if they do match, then remove the key from the map. If the map is empty, then the arrays are identical.




Streaksy(Posted February) [#14]
Yeah good stuff. And always wondered what the hell a TMap is...