Illogical Code Performance

BlitzMax Forums/BlitzMax Programming/Illogical Code Performance

xcessive(Posted 2011) [#1]
Hey everyone, I am fairly inexperienced with blitzmax as a language so there may be some language specific thing I have run into here, although I doubt that.

Basically I have written code for 2 solutions for a problem. In theory one should be much faster than another, however it performs slightly slower, and I really can't work out why.

What this piece of code is designed to do is render a whole lot of "TEntity" objects in a specific order as fast as possible. That order is defined by the yLocation of each TEntity object. For example all those with a value of 7 should be rendered before those with a value of 6.. etc. Here is the code for my two alternate solutions:

Solution 1 (quick sorted array)
'entityList is merely a list of TEntities
Local renderList:TList = entityList.Copy()
renderList.AddLast(player)
Local sorted:TEntity[] = sortEntityList(renderList)
For Local e:TEntity = EachIn sorted
	e.Render()
Next


' FUNCTION USED FOR ABOVE CODE

'sort an entity list using quicksort
Function sortEntityList:TEntity[] (l:TList)
	Local arr:Object[]
	arr = ListToArray(l)
	Local arr2:TEntity[] = New TEntity[arr.Length]
	For Local g:Int = 0 To arr.Length - 1
		arr2[g] = TEntity(arr[g])
	Next
	arr2.Sort()
	Return arr2
End Function


As you can see what I am doing is copying the list to a new one, adding the player in it so the player is rendered in the right order then converting the whole list to an array and sorting the array. Then going through the array and rendering each TEntity. Obviously this should be inefficient as copying the list takes n iterations (where n is the length of the list) as does converting the list to an array. Finally we have to sort it. All up this is 2n + nlog(n) time BEST CASE SCENARIO (as quicksort is used for sorting arrays).

Solution 2 (A special data structure)


		'need to add the player to make sure they are rendered in the right order
		entityMap.addEntity(player) 'add the player for rendering
For Local l:TList = EachIn entityMap.map
	For Local e:TEntity = EachIn l
		e.Render()
	Next
Next
entityMap.removeEntity(player) 'get the player out of the map




'a data structure used to map entites to y locations
'it is adaptive in that most recently moved entites are at the top
'hence statics float to the bottom
'it doesnt need to be sorted, thus giving better performance than a standard list
Type TAdaptiveEntityMap
	'an array of lists, representing each y location
	Field map:TList[]
	
	'creates a new map
	Function construct:TAdaptiveEntityMap(mapSize:Int)
		Local temp:TAdaptiveEntityMap = New TAdaptiveEntityMap
		temp.map = New TList[mapSize]
		For Local i:Int = 0 To mapSize - 1
			temp.map[i] = New TList
		Next
		Return temp
	End Function
	
	'gets a list for a given y value, this can be used for iterating through
	Method getList:TList(val:Int)
		Return map[val]
	End Method
	
	'gets a certain entity out of the map, then retruns it
	Method removeEntity:TEntity(e:TEntity)
		map[e.yLocation].Remove(e)
	End Method
	
	'adds a given entity (e) to the map
	Method addEntity(e:TEntity)
		'addfirst is important, it forms the "adaptive" part of this data structure
		'it ensures recently moves entites are always near the front
		map[e.yLocation].AddFirst(e)
	End Method
EndType



As you can see here I created a simple data structure that stores a list of all the entities at each yLocation. Since when an entity is moved and added to a new location, any frequently moving entites will be at the start of the list, this optimizing it. Anyways the advantage of this is that one only needs to iterate over all the entites once to draw them. Hence there is n operations to do. However when I try both these approaches side by side Solution two is slightly slower, and takes up more memory. Both of which make no sense.

Can anyone see something I'm missing or help me out here?

Last edited 2011


xcessive(Posted 2011) [#2]
This is exceptionally strange, I just tried this on a computer other than my own, and solution two is much faster. Perhaps there is something up with my particular architecture. This makes no sense.


Jur(Posted 2011) [#3]
There is a lot of unnecessary steps in the first method - copying list, converting to array... I would just use the compare method of a type or better, the custom compare function:

Function Compare:int(o1:Object, o2:Object)
	Local t1:TEntity=TEntity(o1)
	Local t2:TEntity=TEntity(o2)
	If t1=Null Or t2=Null Then Return 1 
	Return t1.yLocation- t2.yLocation
EndFunction


Use this function as an argument in the BlitzMaxs SortList Function.

As for the question why the second method can be slower - my experience is that running a LOT of iterator loops can get costly, but I am not sure if that is the case here.


Htbaa(Posted 2011) [#4]
Prepending to a list should be slower if I'm not mistaken, or at least it is in most languages. But I'm not sure if this also counts for TList, which is a double linked list.

What I used to do for this was keep a separate TList and add the entities to it. I'd only sort it (on the y, or z value, whatever floats your boat) if required. So if for a while no new entities are added the list wouldn't be resorted. Depending on your type of game this can be beneficial.

This is more or less what you're doing in your 2nd solution. With the first solution you convert your TList to an array, then you create a temporary array of the same size, fill it, then sort it. This requires memory reservation every time it's being called.

I assume for the 2nd solution you always refill the map with new TList's every iteration, right? I'd say give the single y-ordered TList a chance. To manage these multiple lists (entitylist, y-ordered-list etc.) I use a single EntityManager that is responsible for adding the entities to the list, mark them to be sorted when required etc.


xcessive(Posted 2011) [#5]
Thanks for posting both of you.

@Jur: Thanks for the ideas, but I am already doing pretty much that, I forgot to mention I have a custom compare method in TEntity. Also I know it looks like more steps (and it should be slower), but the mergesort implementation for sorting lists is actually much slower than what I am currently doing (converting to array and sorting array). At least, thats whats shown by the bench marking I've done.

My compare method:
'compare, used for sorting
	Method Compare:Int(withObject:Object)
		Local temp:TEntity = TEntity(withObject)
		If Self.yLocation - temp.yLocation = 0
			Return Self.xLocation - temp.xLocation
		End If
		Return Self.yLocation - temp.yLocation
	End Method


@Htbaa: Some good thoughts, however I already thought of that and discarded it. The reason I can't do that is because I have a lot of entities that are constantly moving, thus I usually have to sort 10+ times during a logical loop. Thus its more efficient to sort them all right before I render them. And when you think about it, all I am really doing is keeping a list, and sorting it when I "need" to at the moment. The problem thus is what the fastest way to sort it is!

With the second solution: No I don't refill it. Each list represents a y location and thus only needs to be changed when an entity moves. So it is in essence what you said (sorted list) optimized for heavy use. Since objects that were recently moved are always at the front.



-----

I have tested this on a few computers now and I think I worked out the issue. In the render() function I was for every entity using drawrect() to create a minimap dot for the entity. For some reason on my computer the drawrect() function is exceedingly slow, and takes so much time that the sorting time is negligible. I have no idea why this is, but it didn't happen when I tested these solutions on other machines. When I get rid of this draw rect, my 2nd solution is twice as fast as the first one, and 3 times as fast as using a sorted TList (using the sort method). If anyone is curious I have a Toxic 5970, so I don't think its an issue of graphical grunt, might be drivers..?

Weird right?

Render Code:
Method Render()
	
		'Check and see if we are rendering within screen bounds
		If (Self.xLocation < offsetX) Or (Self.xLocation > (screenWidth / TILESIZE) + offsetX - 1) Or (Self.yLocation < offsetY) Or (Self.yLocation > (screenHeight / TILESIZE) + offsetY - 1)
			Return
		End If
		
		DrawImage(imgSet.GetImage(), ((xLocation - Width + 1)) * TILESIZE - offsetX * TILESIZE, ((yLocation - Height + 1)) * TILESIZE - offsetY * TILESIZE)
		DrawRect(xLocation, yLocation, 2, 2) 'super cool minimap
	End Method
	


Last edited 2011


Czar Flavius(Posted 2011) [#6]
You can add to the front or back of TList quickly.

Compare method should return -1, 0 or 1 only. Make sure that yours does?

Method Compare:Int(withObject:Object)
		Local temp:TEntity = TEntity(withObject)
		If Not temp Then Return Super.Compare(withObject)
		If Self.yLocation > temp.yLocation
			Return 1
		Else If Self.yLocation < temp.yLocation
			Return -1
		Else
			If Self.xLocation > temp.xLocation
				Return 1
			Else If Self.xLocation < temp.xLocation
				Return -1
			Else
				Return 0
			End If
		End If
	End Method


You might need to change the orders of some of the < and >. The purpose of the Super.Compare line is so that it won't crash if you ever compare a TEntity with something that isn't a TEntity.

Function sortEntityList:TEntity[] (l:TList)
	Local arr:Object[]
	arr = ListToArray(l)
	Local arr2:TEntity[] = New TEntity[arr.Length]
	For Local g:Int = 0 To arr.Length - 1
		arr2[g] = TEntity(arr[g])
	Next
	arr2.Sort()
	Return arr2
End Function


What?? What is wrong with this
l.Sort()


And convert it to an array afterwards, if you really need to. That function is converting a list to an array which is something on every item, then copies it to a second array which is another thing on every item, then sorts it. What if there is only 1 item out of place in the sort? You have spent 99% of the time on copying.

The second copy is unnecessary. You can cast arrays. I understand many of these details are not discussed in the language documentation.

Local arr:TEntity[]
arr = TEntity[](ListToArray(l))


Sorting an already sorted list is very fast.

For your DrawRect issue, first make sure you have the latest drivers and updated DirectX. Then try changing the drivers between DX7, DX9 and OpenGL. I provide an in-game option to pick your desired driver in all my games as I have noticed wild differences between all three on different systems. There isn't one you can say is best overall.

Last edited 2011

Last edited 2011


xcessive(Posted 2011) [#7]
Thanks for the response Czar but there are a lot of things I've already ruled out/discussed there.

I think compare can return any -ve or +ve number, or 0. It shows the magnitude of difference and thats standard in most languages, and I don't think BMax is an exception. Although I could be wrong.

l.Sort() is extremely slow, thats why I don't use it. I don't know why but its slower than creating an array and sorting that. Much slower in fact. To prove it run this code:


SuperStrict
SeedRnd MilliSecs()
Local arr:String[]
Local list:TList = New TList
arr = New String[100000]
For Local i:Int = 0 To Len(arr) - 1
	arr[i] = String(Rnd(0, 2000))
	list.AddFirst(String(Rnd(0, 2000)))
Next

Local t1:Int = MilliSecs()
arr.Sort()
Print "Time Taken to sort array: " + (MilliSecs() - t1)

t1:Int = MilliSecs()
list.Sort()
Print "Time Taken to sort list: " + (MilliSecs() - t1)

list = New TList

For Local i:Int = 0 To Len(arr) - 1
	list.AddFirst(String(Rnd(0, 2000)))
Next

t1:Int = MilliSecs()
Local arr2:Object[] = New Object[100000]
arr2 = ListToArray(list)
arr2.Sort()
Print "Time Taken to sort convert to array and sort: " + (MilliSecs() - t1)


My results in release build mode:

Time Taken to sort array: 51
Time Taken to sort list: 137
Time Taken to sort convert to array and sort: 64

As for the driver option I have the latest version of DX11/10/9..etc as well as openGL 4 down, and the latest drivers. I think it might have something to do with XFire.

Last edited 2011


Czar Flavius(Posted 2011) [#8]
With 100000 objects

Time Taken to sort array: 91

Time Taken to sort list: 283

Time Taken to sort convert to array and sort: 125

With 10000 objects

Time Taken to sort array: 5

Time Taken to sort list: 6

Time Taken to sort convert to array and sort: 5


I would have expected that to take 9 and 28 respectively, dividing by 10, but the performance was a lot better. The CPU has a limited cache to store data it is manipulating super-fast. When exceeding the cache, your algorithm will suddenly suffer a drop in performance. Is your game going to need to sort more than 10000 objects per list? If not, then the difference in performance is very tiny. This is why you should take tests such as these with a big pinch of salt. The artificial conditions are often very different from real world requirements and constraints.

I did not, however, believe that converting to array first and then sorting would increase performance, so thank you for this new information. But in general I don't think it would be worth the step.

Also bear in mind the more disorderly the collection, the longer it will take to sort. Random data is especially hard to sort.

Unfortunately you cannot have a list of ints directly, which is annoying. However you can get around it with this wrapper type which you can feel free to use. It's much faster than using a list of strings and converting between ints and strings all the time.



On this bit of code here

t1:Int = MilliSecs()
Local arr2:Object[] = New Object[testnum]
arr2 = ListToArray(list)
arr2.Sort()

The second line is unneccesary. The ListToArray function allocates and returns the new array itself. The one you allocate before hand is discarded.

I think compare can return any -ve or +ve number, or 0.
I have read the documentation and you are correct. It is not limited to just -1, 0 or 1.

This has been an interesting experiment!

Last edited 2011


xcessive(Posted 2011) [#9]
Some interesting findings Czar!

You are correct, it is always good to take testing like this with a bucket full of salt. You mentioned the cache access becoming a large factor in sorting huge collections of objects, this is true and its a good point. However what happens if someone is playing this game on an extremely old computer, or even if I were to port this to the iPhone by recoding it in monkey/etc. Then that 1 millisecond sorting difference can be blow up to a ratio similar to the test with lots of objects, since older/small CPUs have far less cache, thus cache misses will still be very high.

Also the more disorderly the collection the FASTER it will be to sort with BMax's array sort method. This is because it uses quicksort, whose best case scenario is having a completely random/unsorted collection. A list however will be much faster sorting an already mostly sorted collection as I /think/ it uses an implementation of mergesort. If you give the quicksort method a perfectly sorted collection, it will take far longer to sort than sorting the same list (n^2 time vs nLog n time).

You are right about that second line being unnecessary haha, but I was just hacking up code as fast as possible.

As a final note, in my original solution two no sorting is required. Thus I found with even a moderate number of objects performance was MUCH faster, even if those objects were moving around a lot. This is once I get rid of drawrect() of course :p. So my original issue has been solved. This has been an very interesting tangent none-the-less!

Last edited 2011