Sorting an array

BlitzMax Forums/BlitzMax Programming/Sorting an array

Glenn Dodd(Posted 2007) [#1]
I have a multidimensional array.
for simplicity lets use this:
arr[0,0] = "stuff"
arr[0,1] = "barcode"
arr[1,0] = "more stuff"
arr[1,1] = "more barcode"
arr[2,0] = "again more stuff"
arr[2,1] = "again more barcode"

i want to sort this by the second field, ie the new sorted order would have "again more barcode" before "barcode" before "more barcode".

the array.sort doesn't appear to have a parameter for the column/dimension?

or am i just doing this in the wrong way?
i have sets of records that need to be grouped (0,0 and 0,1) and there are many of them (1,0; 1,1; 2,0; 2,1 etc)

Cheers
glenn


Dreamora(Posted 2007) [#2]
You will need to write your own sort function to work differently and give that to the array.sort call

Although, even that won't help.

What you would need would be using Array of Array [][] and only sort arr[1].sort for example (that is its own array again)


Glenn Dodd(Posted 2007) [#3]
ok here is what i am really doing.
i have a file from a customer which should really have been multiple files because the content needs to be processed in different ways.
i am reading the contents of the file into an array.
each line of the file is delimited. so i break this down and record it as a field in the array. for instance line 1 field 1 is arr[1,1]
line 1, field 2 is arr[1,2] etc
there are 33 fields per line and over 2000 lines.

i was planning to sort the array based on the particular field (field 3 in reality) then output the data to seperate files based on logic around the content of field 3.

thinking about it i guess i don't need to sort it as i could create the output files at the start, and as i read each line write it back out to the correct file based on the value of field 3.

ok - so i have a solution to get me through this particular problem, but is there a sort based solution? from the little i have read about arrays of array (a subject grey alien has on this forum) i wasn't quite sure if dreamoras suggestion would work (but then i am not sure i understand arrays of arrays properly, or at least it is different to what i expected)

regards
glenn


JazzieB(Posted 2007) [#4]
BlitzMax uses a Compare() method for sorting and other situations. If you have your array as part of a Type you can override this by writing your own Compare() Method. All this method does is take one object as a parameter (your array in this instance) and compares it to the current object. It then returns -1, 0, or 1 depending on whether the object should be considered less then or greater than the one passed to it (might be the other way round!). You don't have to worry too much about what's being passed as long as you're doing the comparison correctly. That all sounds very complicated, so here's a little example...

Strict

Type TPerson
	Field name:String
	
	Global nameList:TList
	
	Function Create(p:String)
		If nameList=Null Then nameList=CreateList()
		Local newPerson:TPerson=New TPerson
		nameList.AddLast(newPerson)
		newPerson.name=p
	EndFunction
	
	Function ShowList()
		For Local i:TPerson=EachIn nameList
			Print "  "+i.name
		Next
	EndFunction
	
	Method Compare:Int(p:Object)
		Local n:TPerson=TPerson(p)
		If n Then
			If name.ToUpper() < n.name.ToUpper() Then
				Return -1
			ElseIf name.ToUpper() > n.name.ToUpper() Then
				Return 1
			Else
				Return 0
			EndIf
		Else
			Return 0
		EndIf
	EndMethod
EndType

RestoreData nameData

Local n:String
For Local i:Int=0 To 4
	ReadData n
	TPerson.Create(n)
Next

Print
Print "Before sorting..."
TPerson.ShowList

Print
Print "After sorting..."
TPerson.nameList.Sort
TPerson.ShowList

#nameData
DefData "Rod","jane","Freddy","john","Spud"

Sorry it's not commented, as it was something I did when I was trying this stuff out. Hope it helps.

Of course, the other way of doing this is to simply write a function to do it manually.


Glenn Dodd(Posted 2007) [#5]
what if the type has 2 fields? ie firstname and lastname.
and i want to sort on lastname?
i am at work at the moment so i can't try till later, but i think it would just sort the specified field?


Dreamora(Posted 2007) [#6]
You implement the compare so it is up to you how it compares :)
you can implement different sort methods as well and explicitely use one depending on the case.


Kibo(Posted 2007) [#7]
Hey Glenn,

I was just wrestling with a very similar problem. Here's an example I built, inspired by JazzieB's example and a couple others that were posted in different threads. It shows a user-defined record type which has a first name, a last name, and a score, and can be sorted in multiple ways, including last-name-then-first-name.

Sorting a TList is much slower than sorting an array if the list is very large, because of the nature of lists. Basically, the time taken is proportional to the square of the number of records. On my computer, 1,000 records take 1/20 second, but 30,000 records take 45 seconds.


' by Kibo -- this code is released into the public domain

SuperStrict

Const NUM :Int = 5000 ' number of records to sort

Global PersonsList :TList = New TList ' list of all the records

Local a :Int ' loop variable, for counting records
Local b :Int ' loop variable, for counting letters
Local p :TPerson ' the record we're working with
Local t1 :Int ' time before the sort
Local t2 :Int ' time after the sort

For a = 1 To NUM
	p = New TPerson ' create a blank record
	p.score = Rand(1, 100000) ' random scores
	For b = 1 To Rand(2, 5)
		p.firstname :+ Chr(Rand(65,90)) ' random first name
	Next ' b
	For b = 1 To Rand(2, 8)
		p.lastname :+ Chr(Rand(65,90)) ' random last name
	Next ' b
	PersonsList.addlast(p) ' save the record
Next ' a

Print "Working..."
t1 = MilliSecs()

' these two lines do the actual sorting
SetFileSortMode(SORT_LAST_NAME_THEN_FIRST_NAME)
SortList(PersonsList, True) ' True = ascending order

t2 = MilliSecs() - t1

a = 0
For p = EachIn PersonsList
	a :+ 1
	Print "#" + a + ": " + p.score + " " + p.firstname + " " + p.lastname
	If a > 50 Then Exit
Next

Print "Finished sorting " + NUM + " records in " + t2 + " milliseconds."

End

''''''''''''''''''

' The different methods supported for sorting our records.
' You must do SetFileSortMode(something) before you can sort.
Const SORT_SCORE                     :Int = 1
Const SORT_LAST_NAME                 :Int = 2
Const SORT_LAST_NAME_THEN_FIRST_NAME :Int = 3

' Set the current sorting mode.
Function SetFileSortMode( m :Int = SORT_SCORE )
	TPerson._sortmode = m
End Function

''''''''''''''''''

Type TPerson

	Global _sortmode :Int ' too bad BlitzMax doesn't have ":Enum"

	Field firstname  :String = ""
	Field lastname   :String = ""
	Field score      :Int    = 0
	
	' Compare is the method called by SortList for every pair of records.
	Method Compare :Int (o2 :Object) ' Compare() is always passed an Object...
		Local p2 :TPerson = TPerson(o2) ' ...convert Object to our type to use it
		Local c :Int ' scratch variable for intermediate comparison
		
		Select _sortmode
			Case SORT_SCORE
				' for ints or floats, subtract #2 from #1
				Return self.score - p2.score 
			Case SORT_LAST_NAME
				' for strings, call the string Compare method
				Return self.lastname.Compare(p2.lastname) 
			Case SORT_LAST_NAME_THEN_FIRST_NAME
				' compare last names and if there's a tie, compare first names
				c = self.lastname.Compare(p2.lastname) 
				If c = 0 Then c = self.firstname.Compare(p2.firstname)
				Return c
			Default
				RuntimeError("Bad sorting method '" + _sortmode + "' in TPerson.Compare().")
		End Select
	End Method
	
End Type ' TPerson



Kibo(Posted 2007) [#8]
Okay, after doing that, it was so slow that I decided to recode it using the quicksort to see what sort of performance improvement I could get.

The time to run a bubble sort is proportional to the square of the number of items, and looking up each of those items in a TList is proportional to the number of items, so overall a bubble-sorted TList is an O(n^3) task. Getting the data out of the TList into a plain array (or just using a plain array) and then quicksorting the array would be more like O(n log n). In practice, with my 30,000-item dataset, it went from over a minute to about 1/5 second.

My horrible kludge:

1. The objects to be sorted need ToString() and FromString(string) methods that translate all of the object's fields into a single string (just as if you were saving the stuff to disk in human-readable format, so these routines are useful to have anyway.) Put the fields in the order you want them sorted in, and use RSet() to right-justify any integers. In other words, to sort by last name and then by first name, when your data set includes names and game scores, you would translate the records into this string array:

stringarray[0] = "SMITH JOHN 123"
stringarray[1] = "SMITHERS WAYLON 1234567"
etc.

2. Alphabetize that array with stringarray.Sort()! Wow, that was fast.

3. Clear the list (thus deleting all its objects), and re-create all the objects from the strings in the array. This means writing a parser (such as the equivalent of Perl's split() function.)

Even though that's a cumbersome (and potentially fragile) way of going about things, like I said, in my case it was a 500x speedup versus using SortList().

Here's code I'm using to do that on an existing TList of CPFile objects. Assume that the CPFile objects provide ToString() and FromString(string) methods:

'''''''''''''''''''''''''''''''''''''''''''''''''''
'
'  QSortList(TList, flag)
'
'  Like BlitzMax's native SortList() except faster for big lists.
'  Flag = True when you want alphabetical order,
'  Flag = False when you want reverse order.
'
Function QSortList ( list :TList, f :Int = True )
	Local o :CPFile ' temporary object
	Local b :Int ' loop counter
	
	' first convert the list of objects to an array of objects
	Local oa :Object[] = ListToArray(list) ' object array
	Local sa :String[oa.length]         ' string array

	' convert the array of objects to an array of strings
	For b = 0 To sa.length-1
		sa[b] = oa[b].ToString()
	Next
		
	' sort the array
	sa.Sort(f)
	
	' rebuild the original list of CPFiles from the array of strings
	ClearList(list)
	For b = 0 To sa.length-1
		o = New CPFile
		o.FromString(sa[b])
		list.AddLast(o)
	Next ' b

	Return

End Function ' QSortList


Basically, if you can pack your data into an array of strings rather than a TList of fancy objects, the combination of doing those conversions plus the quicksort is still a zillion times faster than sorting the TList. Linked lists aren't efficient data structures in situations where you don't need their unique features.

(And if your objects are already stored in an array rather than a TList, that saves a few steps.)

The bulk of the work here is that you need to write ToString() and FromString(string) methods for your objects, but again, if you need to save them to disk, you can use those same routines (also great for debugging, because the save files can be opened in your text editor.)

If you want to sort the same data according to different rules, then just add a "Select" block to the ToString and FromString methods akin to what I did in the previous example.


Glenn Dodd(Posted 2007) [#9]
Cool.
Thank you for sharing.

Glenn


Kibo(Posted 2007) [#10]
You're welcome. I'm just learning the quirks of BlitzMax when it comes to data-manipulation myself, hopefully I didn't do make too many clueless mistakes (I tend to do everything the hard way, from having learned game development back in the days of 6502 assembly... kids these days are spoiled having more than three registers to work with... gotta unroll those loops by hand... mutter mutter damn kids offa my lawn.) Man, if only BlitzMax had been around twenty-five years ago...


Kibo(Posted 2007) [#11]
Hey Glenn, if you're still checking in on this...

Here's another example where an array of objects is sorted by two different properties of the objects. It's insanely fast compared to sorting a linked list, because this uses quicksort and doesn't repeatedly convert things back and forth the way overriding Compare() to ToString() would.

And it's a lot cleaner than my previous example. The basic method: Convert the objects to strings containing only the fields we want to sort followed by the original array index number, alphabetize that, and then read the ends of those strings to figure out in what order to put the new array of objects.

Lesson learned: Don't put anything into a TList unless you really need it to be a TList. When it comes to sorting, arrays and strings are your friends.

'  Sort_Resolutions
'
'  Returns a sorted version of the supplied array of
'  TResolution objects.  Each object represents a screen
'  resolution.  They have "width", "height", and "depth" fields,
'  and this sorts them by their area, breaking ties by depth.
'
'  This example code doesn't check for error conditions
'  (zero-length arrays) or bad data (fields containing tabs),
'  I removed that stuff to make this example concise.
'  Otherwise this is based on a chunk of an actual application.
'
Function Sort_Resolutions :TResolution[] ( old_list :TResolution[] )
	
	' width to which to pad out integers
	Const NUM_WIDTH   :Int    = 12
	' a character which should never occur in the data
	Const FIELD_SEP   :String = Chr(9)

	Local new_list    :TResolution[old_list.length] ' what we'll return
	Local string_list :String[old_list.length]      ' the data converted to strings
	Local i           :Int                          ' array index iterator
	Local parts       :String[]                     ' returned by Split()

	' Convert the records into strings.
	' The first field(s) are the info we want to sort on,
	' and the last field is the index of the original record.
	For i = 0 To old_list.length-1
		' field 0: area of screen
		' field 1: depth of screen
		' field 2: index of this record
		string_list[i] = ..
			RSet(old_list[i].width * old_list[i].height, NUM_WIDTH) + ..
			FIELD_SEP + RSet(old_list[i].depth, NUM_WIDTH) + ..
			FIELD_SEP + i
	Next ' i
	
	' Alphabetize the strings.
	string_list.Sort(True) ' True = ascending

	' Turn the strings back into records.
	' Last field of each string is the index of the record it came from.
	For i = 0 To string_list.length-1
		parts = string_list[i].Split(FIELD_SEP)
		new_list[i] = old_list[Int(parts[parts.length - 1])]
	Next ' i

	Return new_list
	
End Function ' _Sort_Resolutions



Glenn Dodd(Posted 2007) [#12]
Always checking.
I read through these code snippets constantly.
I have been delving into the wxWidget stuff from brucey lately. I am doing one control at a time, so it is slow going and i never find enough time to start with.
Roll on the xmas holidays.

Cheers
Glenn