Arrays vs Lists (speedtest)

BlitzMax Forums/BlitzMax Programming/Arrays vs Lists (speedtest)

bradford6

(Posted 2007) [#1]

please post your results:

Local starttime:Int
Local Elapsed:Int
Local iterations:Int = 50000
Local n:Int
Local nl:Intwrapper
Local Array:Int[]
Local count:Int
Print "Creating..."
starttime = MilliSecs()
For  n = 1 To iterations

	Local Alength:Int = Len(Array)
	Array= Array[..Alength+1]		' a slice
	Array[Alength] = 2
Next
Elapsed = MilliSecs()-starttime

Print "Creating the Array by Slicing took: " +Elapsed


Local NumberList:TList = CreateList()

starttime = MilliSecs()
For  n = 1 To iterations
	Local IW:IntWrapper = New IntWrapper
	IW.Value = 2
	ListAddLast(NumberList , IW)
	
Next
Elapsed = MilliSecs()-starttime

Print "Adding Elements to TList took: " +Elapsed

Type IntWrapper
	Field Value:Int
EndType


Array = Array[..iterations]	' initialize the array at 'iterations' elements befor loop
starttime = MilliSecs()
For  n = 0 To iterations-1

	
	Array[n] = 42
Next
Elapsed = MilliSecs()-starttime

Print "Creating the Array by initializing it's size first took: " +Elapsed


Print "Reading the data..."
starttime = MilliSecs()
	For  n = 0 To iterations-1
		count:+Array[n]
	Next
Elapsed = MilliSecs()-starttime
Print "Reading the Array Values took: " +Elapsed


starttime = MilliSecs()
	For  nl = EachIn NumberList
		count:+nl.Value
	Next
Elapsed = MilliSecs()-starttime
Print "Reading the List Values took: " +Elapsed

amonite

(Posted 2007) [#2]

results:

Creating the Array by Slicing took: 38077
Adding Elements to TList took: 150
Creating the Array by initializing it's size first took: 1
Reading the data...
Reading the Array Values took: 1
Reading the List Values took: 21

Process complete

grable

(Posted 2007) [#3]

Creating the Array by Slicing took: 6163
Adding Elements to TList took: 23
Creating the Array by initializing it's size first took: 0
Reading the data...
Reading the Array Values took: 0
Reading the List Values took: 3

EDIT: this is with debug OFF!

Windows XP SP2 32bit
AMD64 3400+
1gb RAM

Peter

(Posted 2007) [#4]

with Debug <off>

Creating the Array by Slicing took: 1598
Adding Elements to TList took: 16
Creating the Array by initializing it's size first took: 0
Reading the data...
Reading the Array Values took: 0
Reading the List Values took: 2

Windows XP Pro SP2 32bit
Intel Core 2 Duo
2 GB RAM

Peter

tonyg

(Posted 2007) [#5]

Creating...
Creating the Array by Slicing took: 17711
Adding Elements to TList took: 74
Creating the Array by initializing it's size first took: 0
Reading the data...
Reading the Array Values took: 1
Reading the List Values took: 8

Chris C

(Posted 2007) [#6]

missing the point of lists, how about some tests for inserting items into random places in a list or array, not to mention other list type functionality...

FlameDuck

(Posted 2007) [#7]

What Chris said.You're really trying to compare proverbial Apples and Oranges. Also it should be mentioned that the high probability of inaccuracy in your smaller numbers (like say anything below 4) may askew your results.

Anyway, here are my results:
Creating...
Creating the Array by Slicing took: 3290
Adding Elements to TList took: 23
Creating the Array by initializing it's size first took: 0
Reading the data...
Reading the Array Values took: 1
Reading the List Values took: 2

EOF	(Posted 2007) [#8]

Creating the Array by Slicing took: 35179
Adding Elements to TList took: 357
Creating the Array by initializing it's size first took: 2
Reading the data...
Reading the Array Values took: 1
Reading the List Values took: 39

Grey Alien

(Posted 2007) [#9]

yeah you need to iterate the reading tests say 100 times for more accurate results.

bradford6

(Posted 2007) [#10]

I am not asserting that one method is better than the other. In fact, just the opposite. They both have their uses. I use both in my current project but I just wanted to see if my results were the same as other's results. Also, there was a question by user PO about storing numbers so I wanted to offer this as a possible path to a conclusion for him (or her).

judging from the results, I would make the following assumptions:

1. if you have a Large, Known quantity of objects (and they are all of the same type) use Arrays. For example, If you have 10,000 particles of type Tparticle, Iterating through them and updating their various attributes will be much faster with Arrays.

2. If you have the need for a collection of mixed types, use lists.

3. If you need to constantly add or subtract members of your collection, or insert and item in a certain place, Use Lists.

4. If you need fast access to random spots in your collection, Use Arrays.

Grey, Flameduck and Chris:
Agreed...on all points. This was not a scientific test and does not highlight all of the various strengths/weaknesses of each method. just a quick test for some dirty data...

ImaginaryHuman

(Posted 2007) [#11]

I thought it would be fairly obvious what the pro's and con's of lists and arrays are? Arrays have all the data right next to each other in sequential memory, whereas items in a list not only have the overhead of having to jump back and forth indirectly but also that each object reference could then be scattered throughout memory (plus has to be cast back to something to be of any use).

If you want speed of access use an array. If you want speed of resizing adding/removing use a list. Or, if you're like me, use both together, or a list or arrays.

Chris C

(Posted 2007) [#12]

honest we're not being funny bradford6 (honest!) "speed" tests can be very counter productive and are *very* much harder to get accurate than you'd think.

What about different bus sizes, memory types, yadda yadda the list is truly tedious!

not only that, but unless a specific piece of code is visited very often its frequently not worth the ball ache of optimizing.

I'd say get your code behaving *exactly* like you want it too, then, if and only if, its too slow should you look at optimizing it.

A general rule of thumb like "arrays are faster than lists" often lead to missed oportunities

Lists can often offer more elegant solutions that can mean you dont have to do as much work. Often you'll never see the opportunity till you're "at the coal face" coding.

Theres a lot to be said for programming convenience too

Grey Alien

(Posted 2007) [#13]

Yeah I've used lists before where preallocated arrays would be faster but the list management was just lots easier to code (I mean work with).

Regular K

(Posted 2007) [#14]

Debug off:

Creating...
Creating the Array by Slicing took: 5906
Adding Elements to TList took: 28
Creating the Array by initializing it's size first took: 0
Reading the data...
Reading the Array Values took: 0
Reading the List Values took: 4

Debug on:

Creating...
Creating the Array by Slicing took: 5740
Adding Elements to TList took: 60
Creating the Array by initializing it's size first took: 1
Reading the data...
Reading the Array Values took: 0
Reading the List Values took: 16

SculptureOfSoul

(Posted 2007) [#15]

There are some problems with this test. First off, nobody in their right mind is going to resize an array by continually slicing it to be one element larger. In this case, you're doing iterations (50,000) worth of mem copies, each one larger than the last (since the array is now larger.) Secondly, there's no need to clutter up the stack with the local Alength when you can just go

Array= Array[..Len(Array)+1]

Of course - as I said, that test is still flawed because nobody should ever use slicing that much. A more accurate test would - on each iteration - create a new array of some size - say, 500 elements - fill it with data and then slice it ONCE to a larger size, say 2000 elements. Then you can divide the elapsed time by the # of iterations and get the approximate amount of time a single slice (of a known size) took.

Lastly, the test that you have labeled "Creating the array by initializing it's size first" is not timing the creation of the array at all, but is instead timing the amount of time it takes to initialize an already created array of 50,000 elements. This might just be a semantics issue - I'm not sure what you were trying to test here.

I'm not trying to harp on you as I always like to see the pro's and con's to different methods - it's just that inaccurate tests are worse than no tests at all.

bradford6

(Posted 2007) [#16]

with the results already posted I have confirmed a few assumptions and learned a bit along the way. All in all, I think the test was successful.

Programming is as much art as it is science and finding the balance between accuracy and speed (in the act of coding as well as in the actual code) is my focus.

I use lists and I use arrays. tests like this help me determine which is more useful for the task at hand

thanks for all your input folks.

next up. Tmap tests... :)

ImaginaryHuman

(Posted 2007) [#17]

I think that the fastest option is actually not arrays but direct memory access using pointers. An array is, after all, an object/custom type which means having to indirectly jump to the memory pointer in the object, whereas if you had a `base address` you just need to go address[offset] to go straight to the data. Something to consider. I am using memory allocation in my program so that I can use direct pointer manipulation for that very reason, avoiding arrays altogether.

H&K	(Posted 2007) [#18]

I think that the fastest option is actually not arrays but direct memory access using pointers

I would go for Bank access rather than Memory access. (Which is what I think you were saying anyway)

SculptureOfSoul

(Posted 2007) [#19]

I think I'll have to whip up some test code for my THashtables for that next test of yours bradford. :)

Actually, what you should do is a test where you have to scan through the data type for an arbitrary piece of data, and then run tests where that data is at the front of the data type and at the end of the data type. That way you'll have numbers for best case and worst case access times. Of course, I should admit that part of the reason I"m proposing this test is because with data types of any significant size my hash table should be the fastest. ;)

Chris C

(Posted 2007) [#20]

tests like this help me determine which is more useful for the task at hand

have actually given you over generalized and inaccurate prejudices, that I hope haven't mislead others...

FlameDuck

(Posted 2007) [#21]

Arrays have all the data right next to each other in sequential memory

Not necessarily. From what I can tell from the debugger output, an array seems to be more like a "list of arrays". At least that's the way the data is represented.

I think that the fastest option is actually not arrays but direct memory access using pointers.

I think programmers talking about "the fastest method" of doing something in the year 2007, is a moot point. With hardware speed increasing at an exponential rate, trying to shave off a few clock cycles here and there is pointless.

Particularly since the complexity of software is also increasing, time would be better spent working out the better data structures or algorithms. Any problem that you can solve in polynomial time, is good enough particularly if you're operating on a small dataset (which most games do).

SculptureOfSoul

(Posted 2007) [#22]

So when are you going to take another look at Lisp? ;P

ImaginaryHuman

(Posted 2007) [#23]

H&K, why would you use a bank over just some allocated memory? Because of the locking thing? What is the purpose of that? Isn't a bank just a `wrapped` chunk of memory? Also accessing the bank using poke/peek is slower since you have to pass the bank ID with every call.

Flameduck, I think if the computer gets faster and you're spending time doing the same amount of optimizing as you would've on a slower computer, you stand to make roughly the same percentage of speed increase, which on a faster computer is going to mean a lot more cpu cycles extra than on a slower machine. Also if people keep writing sloppier and less efficient code as CPU's get faster, there is no point having a faster CPU. The CPU speed is cancelled out by the bloated inefficiencies which is why today we have radically faster machines than 10 years ago which in some cases can barely provide the same functionality at the same speed due to such inefficiency and complexity. I prefer to still look to saving memory use, disk space and cpu time. I think it is time well spent regardless of what statistics say. I think of it more like `polishing` the code into a more beautiful work of art, than simply the technical merits or speed/space benefits.

As to the arrays, maybe they are lists of arrays in some fashion, although I don't really see provision for that in the array sourcecode. Either way I am not using Blitz arrays because of their overhead and unneeded features, in my particular case. Like I said I use a block of memory to ensure it is sequential.

FlameDuck

(Posted 2007) [#24]

So when are you going to take another look at Lisp? ;P

When I need to. Which is probably never.

AngelDaniel: Sure. A couple of years ago, I would have agreed with you, and I think it's the main position for someone who does not have to maintain their software for it's entire life cycle. If you do, it quickly becomes a priority to organize your code in such a way that it's easy to maintain, particularly since it'll still be used in three years when CPU speed will have doubled. Twice.

The only significant exception to this is operating systems. Any amount of CPU time spent on running the OS is effectively wasted, since "running an OS" is not why most people buy a computer. That's why everyone thinks XP is so great. Because they run it on computers several orders of magnitude more powerful, than the ones it was originally designed for.

The CPU speed is cancelled out by the bloated inefficiencies which is why today we have radically faster machines than 10 years ago which in some cases can barely provide the same functionality at the same speed due to such inefficiency and complexity.

Name one. Sure we still use computers to solve the same type of problems (polynomial time) but our processing capacity has increased by several orders of magnitude.

Chris C

(Posted 2007) [#25]

Once upon a time I had a 1ghz laptop it came with winXP and ran nice and fast (XP had not long been out and 1ghz for a laptop was quite good!) after a number of years and a number of updates the laptop crawled, was sluggish and not nice to use.

I reinstalled XP, woohoo nice and fast, then after 6-8 reboots I had updated XP, and no surprise it was back to its sluggish ways. I stopped using it for a while, thinking it "obsolete"

Then I decided to teach my girlfriend about CMS like she been asking, I know I'll wipe that laptop an put a different operating system on it.

What joy my laptop sprang back to a modest speed and became usable useful tool.

So while I agree that shaving the odd clock cycle per vbl is basically pointless, you *do* need to keep a weather eye on bloat.

The difference between a list and array or a bank is largely immaterial, methodology is important.

Sometimes "fastest" can mean time to market too...
(do you really have 2 weeks to waste to shave off those few clock cycles per frame)

ImaginaryHuman

(Posted 2007) [#26]

I agree Chris, although the `time to market` thing is like, okay, who is setting the limits about when things have to be done by, and why, and why is that important? We have all the time in the world.

bradford6

(Posted 2007) [#27]

Chris,
c'mon. how could i ever possibly mislead others with your oversight. :)

I think the whole of this thread will be valuable for folks to run the tests, determine if they make sense or not, read the comments and come to their own conclusion.

just for the record though I appreciate all of the comments.

H&K	(Posted 2007) [#28]

Well yes, except to the slower bit, because I would keep a pointer to the start of the bank.
But, I dont now think that a "Bank" is automaticaly better than allocated memory. I would probably still try a bank first. But would now also concider just memory.

ImaginaryHuman

(Posted 2007) [#29]

I thought somewhere that when they added the locking feature to banks, as a requirement before you can access them, we were told it was necessary due to the potential for memory to be moved around by the o/s or something and that you had to lock it to keep it in place? ... But then, BRL's modules themselves use a lot of mem alloc that doesn't use banks so I don't quite see how that's consistent. Allocating memory gives you a pointer to the base of it, which you pass in order to free the memory, so I am presuming that pointer must NOT change otherwise not only would you write to random memory and cause a crash, but you'd not be able to free the memory? Unless Blitz keeps track of its reallocated pointer internally? I don't know. I think I will still aim to use memory rather than banks if not just because you don't need the extra fields/data to store the bank object and the bank base pointer separately. Or.. if there is good reason to use the bank, for safety reasons, maybe I'll consider it.

Speaking of pointer access, I guess it's okay to do myarray[15,20] to access a multidimensional array but you're not allowed to do mypointer[15,20] as a way of accessing memory (because it doesn't know how wide your rows are?)