Hashtables?

BlitzMax Forums/BlitzMax Programming/Hashtables?

Tibit

(Posted 2005) [#1]

I was told to use Hashtables, Can anyone give me a quick explination? Are they faster than TLists/Arrays? What's are the + and - ? I think I've seen this for Blitzmax, but I coudn't find it in the archives or in the tweak forum..

Ferminho

(Posted 2005) [#2]

I think you're referring to B+ TMap, right?

Gman posted a code sample using them here:
[url]http://blitzbasic.com/Community/posts.php?topic=49028&hl=TMap[/url]

However info on this TMap is extremely rare, some official help / doc about this would be great.

Tibit

(Posted 2005) [#3]

Yea perhaps, but I need to know the pros and cons with TMap's compared to TList and Arrays.

And the reason they are not in the docs might mean they are not tested properly. In other words, use them at your own risk, but I don't know..

Cajun17

(Posted 2005) [#4]

The TMap is a implemented using a binary search tree. I'll refer to the TMap it as a tree because a map just implies that there's a key,value pair for every entry in the container.

Compared to a list it's much easier and faster to keep a sorted collection of data. Inserting and finding in a tree happens exponentially faster than in a list. Deleting is about the same because even though the element can be found much faster keeping the shape of the tree can sometimes be a little costly.

Compared to an array it's faster to insert and delete items , but finding can be dome at the same speed using binary search. Of course with the tree there's no notion of indexes so it is faster to index and array then search the tree.

Also because of the nature of the tree the order of insertion matters. If you have a sorted array and insert them in order the tree will not be any faster then a list.

If you want a fairly static collection of data to refrence specific elements quickly it's a good thing to use. If you want to loop through all the elements often or every game loop I'd stick to a simple array.

I got more if you want more, but that's the jist of the TMap/BSTree. Hash tables are different and I don't know as much about them, but I'll tell you what I know if you'd like.

FlameDuck

(Posted 2005) [#5]

The main advantage of the TMap type is that it functions as a dictionary, with only one key to each value.

There is no native Hashmap in BlitzMAX, but for most of the things you would use a hashmap or hashtable for, the TMap tree works just as elegantly (if slightly slower, but using less memory).

Generally TMaps are faster for random access than lists. TMaps (like lists) grow automatically as new space is needed. However a TMap can only hold a single value of a given key. A Hashmap requires an ammount of memory greater than twice the ammount of total objects you want to store, and this ammount must be a prime number, to avoid hash collisions. It cannot easillly grow, nor can objects be easilly deleted. The TMap only uses as much memory as the number of elements it holds, grows as needed, and objects are easilly deleted.

Choosing the right datastructure is a matter of knowing the problem you're trying to solve.

Tibit

(Posted 2005) [#6]

I don't really see how Bmax creates a tree from the "keys" I supply. I mean how do I add branches to the tree, can I search just once branch then? Also TMap uses a string as a key or an object? Can you use ints too?

And if you know more about how "real" hashtables work I'll like to know, just out of curiosity. TMaps seems to fit my needs much better. It's just that this "key" system I don't get. Because when do you have the key, but not the object (the object handle that it). It seems to me this adds to the complexity, yet it is somehow faster =)

FlameDuck

(Posted 2005) [#7]

I mean how do I add branches to the tree, can I search just once branch then?

You don't. Blitz does this for you automatically, and a Binary Search Tree, is sorted in such a way that you always only search one branch at a time. It's kinda like one of those "Guees a number between 1 and 100" puzzles which you could always guess in 7 steps or less, by starting in the middle, and then halving the value each time you "missed" and going higher or lower.

Also TMap uses a string as a key or an object? Can you use ints too?

TMap uses objects for keys, so you can use strings (which are also objects) but not ints. It is however a trivial matter to write an int wrapper Type.

And if you know more about how "real" hashtables work I'll like to know, just out of curiosity.

A real Hashtable works in much the same way, except the key is typically the hash of the object in question, retrieved by some sort of hashing algorythm. This hash is then used as an index in an array.

For more information I recommend reading one of Mark Allen Weiss' books on Data Structures and Algorithm Analysis.

Tibit

(Posted 2005) [#8]

1. How can TMaps be faster than TLists or Arrays?

except the key is typically the hash of the object

2. so if I put the object handle as the key and the value I have a real hashtable? Seems like a practical thing to do anyway, good or bad?

FlameDuck

(Posted 2005) [#9]

1. How can TMaps be faster than TLists or Arrays?

They aren't - at least not generally speaking. They're slower to insert, for starters. They grow as fast as TLists (constant time), but faster than arrays (linear time). Random access is slower than Arrays (constant), but faster than Lists (Linear). Sequential access is indifferent.

One unique property of TMaps is that each Key can only exist once.

2. so if I put the object handle as the key and the value I have a real hashtable?

Not really. A HashTable lookup is in constant time, a TMap in logarythmic. Also two different objects (in memory) which have the same values in their fields, will have the same hash value, so will appear identical as far as a Hashmap is concerned. The handle won't.

Seems like a practical thing to do anyway, good or bad?

I'd go for "bad" since it's unlikely you'll use the exact same object. Probably better to use a descriptive string or something for the key.

kyoryu

(Posted 2005) [#10]

They aren't - at least not generally speaking.

Depends on what you're trying to access by. If you need to index by anything other than an int, they're much faster since you don't have to search every single element to find the right one.

A Hashmap requires an ammount of memory greater than twice the ammount of total objects you want to store, and this ammount must be a prime number, to avoid hash collisions

Huh?

Hash collisions aren't a big deal in any decently-written hash map (and I'd argue that one that DOESN'T deal with that is written by a complete amateur). The hashed value of the key just specified which bucket to look int... buckets can then be pretty much whatever data structure you want, though lists of some type are probably most common.

Wave:

1. How can TMaps be faster than TLists or Arrays?

They're faster if you need to index by something other than an int. They can be more memory efficient than a straight array if you're populating a sparse array.

A good example: If you have an image manager that keeps track of images are loaded, you may need to grab an image by file name. A Map of some sort (be it hash or tree or whatever) is the best data structure to use in this case for anything more than ultra-trivial cases, as it does not require that you go through the whole collection of items one at a time, doing a string comp on each one.

I also think you're getting hung up on the word 'hash'. All a 'hash' is, is a hash function. That's just a function that takes in some data, and spits out a number. Good hash functions have some things that they do to minimize collisions and increase the distribution of the results, but that's all a hash function really is.

You could 'hash' a string by adding all of its ascii codes together. That wouldn't be a very good hash function, but it'd be a hash function.

Tibit

(Posted 2005) [#11]

Thanks for all replies everyone!

I think I'll go with TMaps as they seem to fit best.

Requirements = New Tree 
Requirements.Add( TEngine.Name ,"" )
Requirements.Add( TModulator.Name ,"")	

Later:
For Equip:TEquipment = Eachin Shop.Equipments 
    If Equipment.Find( Equip.Name ) 'Then all is ok!
        Tank.Install( Equip )
    Endif
next

FlameDuck

(Posted 2005) [#12]

Depends on what you're trying to access by.

Hence the "not generally speaking" comment.

Hash collisions aren't a big deal in any decently-written hash map

Yes they are. Once nk (number of keys) exceeds nb (number of "buckets") by more than a factor .5, hash collisions will occur. Once this happens, the hash map is no longer constant time lookups, but can in a worse case scenario be as slow as lists (linear time), depending on how you handle hash collisions (re-hashing or chaining).

though lists of some type are probably most common.

No. That's why you've got a Hashing function in the first place. To convert arbitrary data to an index value you can use for O(1) lookups.

kyoryu

(Posted 2005) [#13]

Once this happens, the hash map is no longer constant time lookups

Well, yeah. But the tone of your post made it seem like the hashmap itself would fail and start losing data.

No. That's why you've got a Hashing function in the first place

I thought it was pretty clear I was talking about the implementations of the buckets, NOT the main hash table? Especially since I seem to have a decent knowledge of what a hash table is, what a hash function is, and why you'd use it?

tonyg

(Posted 2005) [#14]

In case anybody is interested...
Data Structures
and pdf java version...
Problem Solving
and sortsearch with C and VB source...
Sortsearch
plus allsorts...
allsorts